You are on page 1of 6

2016 5th IIAI International Congress on Advanced Applied Informatics

Proactive-Reactive Auto-Scaling mechanism

for Unpredictable Load change
R&D Group, Hitachi Ltd. R&D Group, Hitachi Ltd. Nihon University
Yokohama, Kanagawa, Japan Yokohama, Kanagawa, Japan Koriyama, Fukushima, Japan

Abstract—The Elastic resource scaling for cloud service is we describe the proposed method, and show the evaluation of
widely studied to maintain service performance. Some studies the experimental results using real workload.
proposed adjusting resource based on predicted workload in This paper is organized as follows. In Chapter 2, we discuss
advance. However, real workload can grow regardless of the traditional auto-scale methods and challenges. In Chapter 3 we
history. So we focus on a challenge to adopt such show the proposed method. In Section 4, we describe the
unpredictable load change. We proposed a new auto-scaling evaluation experiments. In Chapter 5, we show related research.
mechanism which changes the scale of target system based on In Section 6 we close this article with conclusion.
predicted workload, moreover it instantly adds resource as
remedy if unpredictable workload fluctuation detected. In this II. CHALLENGE AT THE TIME OF UNPREDICTABLE LOAD CHANGE
paper, we present the design and implementation, and then we OCCURS WITH THE CONVENTIONAL AUTO-SCALE METHOD
evaluated effect of the mechanism using real workload history.
Finally we confirmed it can improve service performance. A. How to change the scalability
Keywords—auto-scaling; load change; service level
At first, we explain the method to changing scalability of
Information Technology(IT) system. Scale-up / down and
I. INTRODUCTION scale-out / in are two basic method of changing IT system
In recent years, the scale of internet applications is scalability depending on the workload [6][7]. The former
expanding, and the number of servers required for the method scale-up / down means adding or removing resources
operation has also been increasing [1]. In effort to cost such as Central Processing Unit (CPU) or Memory to a single
reduction, more companies build their systems on the servers node of the system. A replacement of a node falls into this
provided by cloud providers such as Amazon EC2. category. To increase or decrease set the CPU and memory on
physical server, where a general Operating System(OS) is
In terms of service operation, service providers need to running, the server has to be stopped. But virtualization
keep service performance level according to their Service technology enables hot-plug/hot-remove without stop. In a
Level Agreement(SLA). In such case, service providers should system using the server virtualization mechanism, operators
assign enough amount of resource to ensure the performance. can change the scalability of Virtual Machine (VM) by
However, at beginning phase of service, predicting appropriate software configuration. In a typical server virtualization
resource amount is quite difficult. The providing more the mechanism, the memory allocation is customizable by software
amount of resources leads to increase operating costs (over- configuration and CPU core number is also customizable.
provisioning). On the other hand, resource conservation
sometime leads service level violation because of lack of The latter method scale-out / in means adding or removing
resources (under-provisioning). nodes in target IT system. A large-scale Web service generally
uses several Web servers and load balancer technology to
To adopt workload change, auto-scale technologies based distributing the requests based on Web server’s capacity. In
on monitoring performance are widely studied [2][3][3]. Auto- such system, it is able to change the scalability by adding
scale technology has been studied for a long time, and it nodes and changing configuration the load balancer without
became practical with the growing popularity of Cloud total system stop. However, in order to increase the scalability,
Services. In these auto-scale technologies, because it takes time it takes a time to build a server.Either way, the changing
to allocate new resource, some studies are using the load scalability requires build time or server downtime. Therefore, it
forecasting and task scheduling, so that they can adopt is common way to make plan of the configuration change in
workload change in advance. However, in an actual system advance.
operation, there are cases that service workload fluctuates
regardless of their workload history. Some studies have pointed By using scale-up / down or scale-out / in, many auto-scale
out that accurate workload prediction is a challenge [5]. So, we techniques have been proposed to automate the scalability of
have developed a system to realize the auto-scale under IT system providing Web application. In these approaches,
circumstances such as unpredictable load change. In this paper, when the load monitored or the load predicted value of
Web/AP servers are likely to exceed a threshold value, or if

978-1-4673-8985-3/16 $31.00 © 2016 IEEE 861

DOI 10.1109/IIAI-AAI.2016.180
some scheduled task begin to start, the auto-scale mechanism III. PROPOSAL OF AUTO-SCALE MECHANISM THAT TAKES INTO
determines to change scalability of the target IT system. In case ACCOUNT THE UNPREDICTABLE LOAD CHANGE
of scale-up, the auto-scaling mechanism increase the resources We focus on the operation of the Web application with
allocated to VM for Web/AP server, in case of scale-out it adds unpredictable load change, and we propose a method to solve
another VM to the Web/AP cluster, and distribute the load to the problems mentioned in Chapter 2. In this section, we
new VM. (Fig.1) describe the outline of the proposed method.
In the proposed method, the managed IT system has a
(a) Scale-up
server cluster including specific VMs with extra resource using
Web/AP the existing technology [8]. In other words, the resources of
vCPU vCPU … these VMs are capped.
memory Under normal condition, proposed auto-scale mechanism
changes target IT system scale by scale-out based on workload
forecasting. This is proactive manner. Then if the workload
changes unpredictably, the auto-scale mechanism removes the
(b) Scale-out capping to let the VMs use extra resource. This is reactive
Web/AP manner. In this way, the pre-determined amount of resource is
Data Cache

added to the IT system instantly. After several scale-up

LB DB operation, if there is no more capped VMs, the IT system also

can’t scale. To resolve such problem, the auto-scaling

mechanism in our proposal creates some VMs with capped

resources to prepare further load increasing.

Fig1. Scale up and scale out. A. Flow of system configuration and processing
The configuration and processing flow of the proposed
Depending on the type of server virtualization mechanism method is shown in Fig2. The proposed method system
and the OS, the resource amount can be hot-added should be consists of seven main components: load monitoring module,
reserved in advance. So, a method for appropriately estimating load forecasting module, resource planner module, and scale
the amount of the resource has been proposed [12]. plan generator modules, scaling director module, auto-scaling
manager that controls all modules, and the virtual environment
B. Challenges of conventional methods management and configuration information DB. In Fig2., the
As mentioned in previous section, the existing auto-scale blue arcs show proactive process, and the red arcs show
approach implement to changes scalability based on the load reactive process.
monitoring and forecasting. However, since the load
fluctuation of IT systems is not predictable, existing auto-scale The load monitoring module collects resource usage
method can’t adopt resource size if the load fluctuation information of VMs belonging to target IT system with fixed
changes. For this reason, there is the following problem: time interval. The load forecasting module calculates a load
after delta time based on load history stored in history DB
In case of sudden load increase, a system can’t increase the inside the load monitoring module(Step1). Here, a load means
scalability instantly. request arrival rate In order to adopt the load, the resource
planner module calculates the resource amount to be
allocated(Step2). The scale plan generator module creates auto-

Scale Plan
Load Resource
Forecasting Planner Generator
(Step1) (Step2) Model (Step3) Configuration

Predicted Predicted
Required Required Auto-Scale
Req./sec. Req./sec.
Resource Resource Plan

Rule Scaling
Evaluation(Step4) Director(5)

Service ID Service ID Add Resource Add/Remove VM

Response time Response time Target VM ID Target VM ID
Req./sec. Req./sec.

㻸㼛㼍㼐 㻹㼛㼚㼕㼠㼛㼞㼕㼚㼓 History 㼂㼕㼞㼠㼡㼍㼘㻌㻿㼥㼟㼠㼑㼙㻌㻹㼍㼚㼍㼓㼑㼞


Deployed System
Cloud Infrastructure LB Web/AP DB

Fig 2. Configuration of the proposed method

scale plan based on calculated amount of resources(Step3). characteristics, many of them have a trend. These situations
This plan is a group of commands to execute VM creation, must also be taken into consideration. Therefore, we use
power-on, and registration to the load balancer, or delete and ARIMA (Autoregressive Integrated Moving Average
stop of the original VM continuously. Next, the scaling director autoregressive moving average) model to forecast the load.
module sends the commands to the virtualization environment
management module (Step5). As a result, new VM is added to The load forecasting module gets past request rate for
cluster or capped resource is turn to available. certain period from the load history DB. Then, use the ARIMA
model, to predict the HTTP access number after a lapse of t.
On the other hands, the rule evaluation module check Here, t means the time required to copy a VM from a
current response time and CPU usage, if these metric exceed a template and powers it on. Furthermore, some additional time
certain threshold, the rule evaluation module indicates a scaled required to be recognized by the load balancer. Most of them is
up to the scaling director (Step4), and the scaling director sends that time takes to copy and launch, depends on the VM
the commands to scale-up(Step5). template size and applications that are built in the VM. In case
of scale-out, the same VM template is used when auto-scaling
B. Load Monitoring Method manager creates a scale-out target of VM. Therefore t is also
As mentioned previous section, Load monitoring module nearly constant in each case of scale-out. So we measured the
collects performance metric from managed IT system. In time required to build a VM and use it as the time t required
proposed method, the load monitoring module collects CPU for the resource addition.
utilization and memory utilization, the access number of In a SLA standard guideline, the system performance is
monitoring to the Web/AP/DB server. required 80% for a lower level, 90% for middle level, 100% for
To get the access number, the agent inside Web server a high level. Since achieving high level is not realistic, we use
counts the Hypertext transfer protocol(HTTP) request 95 percent as a confidence level of prediction. Because it is
cumulative number at regular interval, and the agent calculates middle of high-level and middle-level. Here we use maximum
the difference between the last two cumulative numbers. In the value of upper limit of 95% confidence interval or forecasted
TCP / IP implementation of a common client OS Microsoft mean in the time range t as the forecasted HTTP access rate
Windows-based OS, the default value of the TCP timeout is 3 (Fig.3).
seconds, and if the connecting operation failed, it will retry 2 Max request
times[11]. Since it takes 9 seconds in longest case, if we check (95% confidential interval)
Max request
the increase or decrease of the HTTP access number finer-
Access rate [req./sec]

(forecasted mean)
grained than 9 seconds, there is a possibility to count duplicate
the same access. Here, we think the half of the longest
connection time would be appropriate, so we monitor the
number of HTTP access at 5 second intervals. To get response
time, a prove agent working on outside the managed IT system
sends a dummy request to the Web application with Prediction Interval
predetermined interval. The collected data is stored in the load t t+t Elapsed time
history DB. Auto-scaling manager get load information from
this load history DB. Fig3. Prediction Interval

Generally, managed IT system has plurality of Web servers. 2) The amount of resources decision procedure
So the load monitoring module calculates the average of load The resource planner module calculates an amount of
information of Web servers. To get access number of entire IT resource required to meet the response performance based on
system, the load monitoring module calculates the total value the load forecasted by load forecasting module. According to
of access numbers from each Web server. For the response our past surveys, SugarCRM that is the typical business
time, HTTP accesses from the prove agent are processed by the application cause performance problem because of lack of
load balancer. Since the load balancer distributes the requests amount of CPU resources. In the proposed method, we focus
to Web servers based on their performance, the requests on the point, so we analyzed relation of CPU resources and
handled by different Web servers return within almost same HTTP access number and response time in advance. By using
turnaround time. the relation, the resource planner module calculates the
required amount of resource that can achieve turn-around time
C. Scale-out Method within 0.8 seconds corresponding to the HTTP access rate
calculated by the load prediction module.
1) Load forecasting procedure
To predict occurrence of events that cause the load 3) VM creation based on the determined resource amount
fluctuation is difficult. So we use time-series analysis for the The scaling director module creates a new VM with
load prediction based on the HTTP access rate history determined resource amount. Usually service providers create
monitored at predetermined time intervals. the target VM image as a template at service-in. The scaling
director module creates a new VM by copying the template
Generally workload of Web application has a characteristic
with parameters of determined resource amount. After powered
that fluctuates depend on the day of the week and time of day.
on, register the VM to the load balancer, scale-out is
On the other hand, it does not follow perfect cycle

D. Scale-up Method
Auto-Scaling Manager
As a result of the load monitoring module, if the response
JDBC vCenter API
time exceeds a certain threshold, the rule evaluation module
carries out a scale-up. According to an article discussed about Zabbix vCenter
response time of Web site in terms of usability, the response DB
time should be shorter than one second not to disturb user’s
thought[10]. Considering network latency, we use 0.8 seconds Managed
IT System Virtual
as threshold to scale-up. So when the response time exceeds Environment
0.8 sec , the scaling director modules indicates the scale-up
using the existing technology[8]. This direction remove the JMeter LB Web/ DB
limitations of the resources. It enables expand the resource of AP
Load Generator
target Web server instantly and improve the response time. Web/
E. Scale-down and Scale-in Method
As a result of monitoring the CPU usage rate, if the CPU
usage rate is less than the predetermined lower threshold, the Fig4. Experimental environment
rule evaluation module decides to do scale-in. Here the
proposed method provides VMs with capped resource to scale- B. Parameter
up, if there is any already scale-uped VM, the scaling plan
generator chooses the VM as target to remove. Because, the To forecast workload according to the procedure described
scale-uped VM is not able to scale-up anymore, the auto-scale in Chapter 2, we measured time t required for the VM
manager removes such scale-uped VM preferentially. Then the creation. In our experiment, it takes 166 seconds until created
auto-scale manager adds scale-upable VM that has capped VM is recognized by the load balancer. The value is an average
resource when executing scale-out. of 5 times trial measured in the environment that uses the
evaluation experiment. So, we use this value as t.
In a typical data center, it is said that CPU usage rate is
10% to 50%. Therefore, when the monitored CPU usage rate is As mentioned in Chapter 3, the resource amount to be
less than 10%, the rule evaluation module carries out a scale-in. added is calculated by using the relationship between the
amount of CPU resources and HTTP access rate and response
time. Therefore, we measured the HTTP access rate (requests
IV. EVALUATION per second) that can keep 95% service level guarantee rate in
several patterns varying CPU resource amount (number of
A. Experiment environment virtual CPU(vCPU)). As a result, the 2 vCPU system keeps
To evaluate the proposed method, we built experimental performance until HTTP access rate 6, 3 vCPU system keeps
system and conducted experiments by using real workload performance until 9, 4 vCPU system keeps performance until
data. In this experiment, target IT system consists VMs are 15 requesters per second. This relation is stored in performance
running on a virtual environment over four physical servers. DB and the resource planner module uses it to calculate
Each physical server has Intel Xeon E5430 2.66GHz (up to 8 required resource amount.
cores) and a memory of 8GB. To manage virtual environment,
we use VMware vSphere 5.1. In this virtual environment, we C. Conventional method
prepared following three types of VM:
As mentioned in Section 1, some existing auto-scale
Load-balancer VM CPU: 1 core, memory: 1GB, OS: method control scaling based on load forecasting. Therefore,
Linux (CentOS 6.4), Load  balancer: LVS, Mem- cached we evaluated existing auto-scale method using only load
1.4.16 forecasting in our environment system using 7 days data of the
Web site of the FIFA WorldCup 981. We contracted this data
Web/AP server VM: 1 core, memory: 2GB, OS: Linux
to fit into our experimental environment, and then we
(CentOS 6.4), Web server: Apache 2.2.5, application:
measured the response time and CPU resource consumption for
SugarCRM Community Edition 6.5.14
the HTTP request according to the access pattern of 10 minutes
DB server VM CPU: 4 core, memory: 8GB, OS: Linux in day 7. In this pattern, a sudden load change occurs after a
(CentOS 6.4), DB server MySQL 5.1.69 lapse of 6 minutes.
Images of these VMs are stored to an external storage At first, we specified ARIMA model parameter based on 6
enclosure connected via a Fiber Channel. Then we prepare days data and by using open source statics analysis tool R, and
another physical server for auto-scaling manager that then input it into the load forecasting module. Here, we use
implements the proposed method. ARIMA(2,1,2). The load forecasting module predicts by using
this model. Then the load generating software send requests
We use open source monitoring software Zabbix as the based on the real load history of the day 7. Figure.5 shows real
monitoring module, and use vCenter as virtual system manager.


request rate and forecasted rate. For this real pattern, the auto- D. Proposed method
scaling manager controls scalability based on the result of the Next, we evaluate the proposed method using same
load forecasting. The auto-scaling manager adds the 1 VM if workload pattern in previous section. At first we provide 2
the forecasted HTTP request rates exceed certain threshold rate VMs have the 2 vCPU, but one of them is capped as 1 vCPU.
to keep turnaround time 0.8 seconds. At first, the auto-scaling mechanism executes scale-out based
on predicted load. Then, same as previous experiment, the load
Real Request rate Predicted Request rate
suddenly changed after 6 minutes. At that time, the auto-
scaling manager detected that current response time and cpu
usage exceeded threshold, and directed scale-up. So, a capping
Request rate[req./sec.]


configuration of a Web/AP server VM removed, and one more
vCPU became available to the VM. As a result, most of the
HTTP requests return within 0.8 seconds. Finally service level
4 guarantee rate is 99.5%.
Response time # of vCPU
0 1.4 7
1 2 3 4 5 6 7 8 9 10
1.2 6
Elapsed time[min.]

Response time[sec.]
1 5
Fig5. Real workload and predicted request rate

# of vCPU
0.8 4
Fig6 shows response time and number of vCPU in 0.6 3
conventional method scaling based on predicted mean 0.4 2
workload. In this experiment, initial configuration of Web/AP
0.2 1
layer consists of 3 VMs, the auto-scale manager added 1 VM
0 0
after 3 minutes. However, the workload increased rapidly, so 0 1 2 3 4 5 6 7 8 9
the response time of some requests over 0.8 seconds. Finally Elapsed time[min.]
the service level guarantee rate is 86.9%. Fig7 shows response
Fig8. Proposed method
time and number of vCPU in the conventional method scaling
based on upper limit of 95 % confidential interval. In this
experiment, initial configuration of Web/AP layer consists of 5 E. Consideration
VMs, the auto-scale manager added 1 VM after 3 minutes 30 Table 1 shows experimental result summary. The response
seconds and can keep 95% service level guarantee rate. performance of proposed method decreases little for the sudden
load change. In SLA standard guidelines, the ratio of the
Response time # of vCPU number of transactions that can be simultaneously processed
1.4 7 should be over 90% for middle-level. For the real Web access
1.2 6 workload, the proposed method and the conventional method
Response time[sec..]

1 5 scaling based on upper limit of 95 % confidential interval can

meet this moderate level. The archived rate is same, but the
# of vCPU

0.8 4

0.6 3
resource consumption is reduced by 26%.
0.4 2 Table1. Experimental result summary
0.2 1 SLA Guarantee rate Resource consumption
[%] [vCPU*sec.]
0 0
1 2 3 4 5 7 8 9
Conventional(mean) 86.9 2190
Elapsed time[min.]
Conventional (95%) 99.5 3390
Proposed method 99.5 2490
Fig6. Conventional method (predicted mean)
response time # of vCPU
1.4 7
Galante et al. classify auto-scale method into Predictive and
Reactive [2]. In Predictive category, Mao et al. describe a
1.2 6
method for auto-scaling based on amount of resources
Response time[sec.]

1 5
predicted by using task scheduling [3]. Hanai et al. propose a
# of vCPU

0.8 4 method of predicting the resource amount by multi-agent

0.6 3 simulation [4]. In existing research, there are many studies
0.4 2
about auto-scaling based on amount of resource obtained by
forecasting or schedule. In contrast, Shen et al. point out that it
0.2 1
is difficult to predict a workload fluctuation from the load
0 0
0 1 2 3 4 5 6 7 8 9
history, and propose a method that defines a penalty in the case
Elapsed time[min.] of service level violations has occurred and make sum of
Fig7. Conventional method (upper limit of 95% confidence interval)
penalty minimum [5]. Further Deng et al. proposes a method of
scale up implemented using existing technology, and

predicting the value of the resource usage limit by using a IEEE Int̓l Confon Service Computing (SCC 2013), pp.486-493, Jul.
general load forecasting models [12]. 2013

In this way, many of the existing auto-scale technology

expose methods that execute auto-scale based on the amount of
resources obtained by forecasting or schedule. However, more
people access to web applications from their mobile devices
any time, unpredictable workload fluctuation always can
happen. For such workload, these existing research and
existing technologies are difficult to adopt.

In this paper, we propose the auto-scale method to ensure
service level under the circumstances unpredictable load
change occurs. In the proposed method, auto-scale mechanism
change scalability based on predicted workload, in addition,
performs scaling up if unpredictable workload fluctuation
detected. As result of the actual evaluation experiment using
the access rate based on real workload, we confirmed that the
proposed method ensure service level against unpredictable
workload change.

[1] L. A. Barroso, J. Clidaras and U. Hoelzle, “The Datacenter as a
Computer: An Introduction to the Design of Warehouse-Scale
Machines: Second Edition,” Morgan and Claypool Publishers, 2013.
[2] G.Galante, L. C. E. de Bona. "A survey on cloud computing
elasticity." Utility and Cloud Computing (UCC), 2012 IEEE Fifth
International Conference on. IEEE, 2012.
[3] M. Mao, and M. Humphreyi, “Auto-Scaling to Minimize Cost and Meet
Application Deadlines in Cloud Workows,” Proc. of 2011 IEEE Int’l
Conf. for High Performance Computing, Networking, Storage and
nalysis (SC 2011), pp.1-12, Nov. 2011.
[4] M. Hanai, T. Suzumura, A. Ventresque, and K. Shudo, “An Adaptive
VM Provisioning Method for Large-Scale Agent-based Trac
Simulations on the Cloud,” Proc. of 6th IEEE Int’l Conf. on Cloud Com-
puting Technology and Science (CloudCom 2014), pp.130-137, Dec.
[5] Z. Shen, S. Subbiah, X. Gu, and J. Wilkes, “Cloud- Scale: Elastic
Resource Scaling for Multi-Tenant Cloud Systems,” Proc. of the 2nd
ACM Symposium on Cloud Computing, pp.5:1-5:14, Oct. 2011.
[6] D. Huang, B. He, and C. Miao, “A Survey of Resource Management in
Multi-Tier Web Applications,” IEEE Communications Surveys &
Tutorials, col. 16, No. 3, pp. 1574 - 1590, Jan. 2014.
[7] K. Hwang, Y. Shi, and X. Bai, “Scale-Out vs. Scale-Up Techniques for
Cloud Performance and Productivity,” Proc. of 6th IEEE Int’l Conf. on
Cloud Computing Technology and Science (CloudCom 2014), pp.763-
768, Dec. 2014.
[8] VMware, Inc.: VMware White Paper: Virtualizing Business-Critical
Applications on vSphere (online), available from
< solutions/ VMware-Virtualizing-
Business-Critical- Apps-on-VMware en-wp.pdf>
[9] Zabbix - The Enterprise-Class Open Source Network Monitoring
Solution (online), avail- able from
<> (accessed 2015-07-20).
[10] Nielsen Norman Group: Website Response Times (online), available
from < articles/website-response-times/>
[11] D. MacDonald and W. Barkley: Microsoft White Paper: Microsoft
windows 2000 tcp/ip implementation details (online), available from
< download/7/7/1/7716a332-d3af-4ad5-
b249- 38ca97db023e/tcpip2000.doc>(accessed 2015-07-20).
[12] D. Deng, Z. Lu, W. Fang and J. Wu, ͆CloudStream-Media: A Cloud
Assistant Global Video on Demand Leasing Scheme,͇ Proc. of 2013