Exploring Potential For Non-Disruptive Vertical Auto Scaling and Resource Estimation in Kubernetes

2019 IEEE 12th International Conference on Cloud Computing (CLOUD)
Exploring Potential for Non-Disruptive Vertical

Auto Scaling and Resource Estimation in
Kubernetes
Gourav Rattihalli, Madhusudhan Govindaraju, Hui Lu Devesh Tiwari
Cloud and Big Data Lab Department of Electrical and Computer Engineering
State University of New York at Binghamton Northeastern University
Binghamton, United States Boston, United States
{grattih1, mgovinda, huilu}@binghamton.edu d.tiwari@northeastern.edu
Abstract—Cloud platforms typically require users to provide (e.g., CPU, Memory, GPU, I/O, Network bandwidth) and
resource requirements for applications so that resource managers configuration of each node, or VM, for the experiments is
can schedule containers with adequate allocations. However, the critical in allocating VMs or bare metal nodes, and scheduling
requirements for container resources often depend on numerous
factors such as application input parameters, optimization flags, jobs. Accurate resource estimation is a challenge for end
input files, and attributes that are specified for each run. So, it users as the requirement for applications can vary in each run
is complex for users to estimate the resource requirements for a because it is dependent on various configuration values such as
given container accurately, leading to resource over-estimation the input file sizes, optimization flags, input parameter choices,
that negatively affects overall utilization. We have designed and the core application kernel.
a Resource Utilization Based Autoscaling System (RUBAS) that
can dynamically adjust the allocation of containers running in Errors in resource requirements for applications, specified
a Kubernetes cluster. RUBAS improves upon the Kubernetes by users, is a well known problem and is not restricted to
Vertical Pod Autoscaler (VPA) system non-disruptively by in- the use of Kubernetes [2] [3]. Cloud resource managers such
corporating container migration. Our experiments use multiple as YARN [4], Omega [5], Mesos schedulers such as Apache
scientific benchmarks. We analyze the allocation pattern of Aurora [6] and Marathon [7], along with HPC resource
RUBAS with Kubernetes VPA. We compare the performance
of container migration for in-place and remote node migration managers such as Torque [8] – all require estimation for each
and we evaluate the overhead in RUBAS. Our results show that application before it is launched.
compared to Kubernetes VPA, RUBAS improves the CPU and It has been recorded that users in about 70% of cases request
memory utilization of the cluster by 10% and reduces the runtime more resources than required [9]. In [9], Delimitrou et al. also
by 15% with an overhead for each application ranging from 5% argue that over-allocation of resources for applications causes
to 20%.
increased wait times for pending tasks in the queue, reduced
throughput, and under-utilization of the cluster. Incorrect esti-
I. I NTRODUCTION
mation of resources can significantly increase the overall cost,
The Cloud Native Computing Foundation (CNCF) has in Service Units (SUs), charged by cloud platforms running a
driven the adoption of containers in Cloud infrastructures set of applications [10].
and it is currently used by over 50,000 projects [1]. The We also analyzed the Azure Public Dataset [11][12], that
key platform in CNCF is Kubernetes, which is also CNCF’s was released by Microsoft. In Figure 1 (top subplot), we show
first project. The Kubernetes project is highly active with the average CPU utilization and peak average CPU utilization
over 2,300 unique contributors and has emerged as the most over a 24 hour period. The average CPU utilization is always
popular container orchestration platform today. Kubernetes is under 10%, while the peak utilization hovers between 14-17%.
reportedly used by companies with massive scale computing In Figure 1 (bottom sub-plot), for data collected over 30 days,
infrastructures such as Google, Yahoo, Intel, Comcast, IBM, we show the number of Virtual Machines that had average
and eBay. Kubernetes provides important features for com- CPU utilization in 25% ranges. We again see that over 1.5
puting on the cloud such as Service discovery, load balanc- million Virtual Machines had average CPU utilization between
ing, automatic bin packing of containers based on resource 0-25%.
requirements, horizontal scaling of containers out of the box, Kubernetes attempts to address the resource estimation
and self healing – when a node fails, Kubernetes automatically problem via the Vertical Pod Autoscaler (VPA) [13] feature,
re-schedules the applications on a healthy node. which does not require any user input on resource requirement.
However, just like other application deployment systems, The VPA generates an estimate based on the resources the
Kubernetes too suffers from a resource estimation problem. In application is currently using and then corrects it by re-
clusters and clouds, a user’s estimate of required resources scheduling the application with the newly estimated resources.
2159-6190/19/$31.00 ©2019 IEEE 33

DOI 10.1109/CLOUD.2019.00018
Authorized licensed use limited to: Vilnius Gediminas Technical University. Downloaded on October 31,2022 at 13:53:02 UTC from IEEE Xplore. Restrictions apply.
Azure Utilization Analysis
resource allocation in an existing node can be scaled up
Average CPU Usage Peak Average CPU Usage
seamlessly. We note that the VPA approach of vertical auto
CPU Utilization (%)
16
scaling is not possible with cloud resource scheduling tools
14
such as Mesos because allocation changes are not possible
12
10
once an application has been scheduled. However, the RUBAS
8
approach could be extended to Mesos.
0 5 10 15 20 While RUBAS works best for target applications that do not
Time (Hours) have frequent resource variations, its design can also cater to
Number of Virtual Machines
1.5M
those with frequent variations via multiple migrations. We note
1M that several HPC applications have a resource consumption
pattern wherein the resource usage does not vary frequently
0.5M
after the initial phase [14].
0
0-25 25-50 50-75 75-100 Specifically, we make the following contributions:
CPU Utilization Pergentage • This paper describes design and implementation of a new
Fig. 1: Azure Cluster CPU utilization analysis: the top non-disruptive and dynamic vertical scaling system for
graph shows the cluster peak average and average CPU Kubernetes, called RUBAS.
utilization over 24 hour period. The bottom figure shows the • We study and quantify the design trade-offs in different
number of virtual machines with average CPU utilization in design choices on the performance of RUBAS including
the respective ranges over 30-day period. profiling interval, container migrations and application
restart overheads.
However, it has a disruptive approach. Its correction approach • We present results and analysis of RUBAS performance
requires killing the application and then re-scheduling it with compared to the Kubernetes VPA in terms of CPU and
estimated resources. While this approach works for stateless memory allocation patterns, application runtime, and
services, it is a serious drawback for stateful and performance overall cluster resource utilization.
sensitive applications. II. D ESIGN
Therefore, this paper explores new ways to augment con-
A. Kubernetes Architecture
tainer orchestration in Kubernetes that dynamically esti-
mates the amount of resources an application requires, non- In Figure 2, we show the components of a typical Kuber-
disruptively migrates, updates the resource allocation, and netes setup. It has a Master and Worker nodes along with etcd
continues with the application execution. for maintaining state of the Kubernetes cluster. The master
We have designed a flexible and configurable framework, node consists of an API server, Scheduler, and a Controller
Resource Utilization Based Autoscaling System (RUBAS), Manager. The controller manager is responsible for the core
to address the resource correction problem in a Kubernetes functions of Kubernetes. These include replication controller,
setup. RUBAS starts with the users’ resource requests for each endpoints controller, namespace controller, and service ac-
application and launches it on the cluster. It then monitors counts controller. The API server helps in validation and
the application’s resource usage via Metrics Server at a con- configuring the data for API objects such as pods, and services.
figurable interval with the default set to 60 seconds. If an The scheduler has many policies and is topology aware. It
application’s usage is beyond 10% of the allocated resources, uses the topology information to appropriately schedule the
then RUBAS creates a new estimate, checkpoints the container pod on a node based on its policy requirement. The worker
in which the application is running, and re-schedules it on the nodes in Kubernetes consist of a kubelet, proxy system, and
cluster with the new estimate. Each new estimate is calculated a Docker runtime environment. The kubelet is responsible
as the sum of median and a buffer, where buffer is defined for communications between the worker node and the master
as the positive deviation of the observation. This approach node. It maintains the state of the worker node and executes
gives us a better estimate compared to the Kubernetes VPA, the pods as defined in the PodSpec provided by the master.
which uses the peak values rather than central tendencies. The PodSpec is a YAML or JSON object that describes the
The peak observation could be an outlier in the observed pod. Kubelet reads the PodSpec and executes the pod and
data and could result in over-allocation. The sum of median its corresponding container via Docker. The proxy is used
and positive deviation also negates the effects of outliers. to communicate between the pods and sets up the network
RUBAS can re-schedule on the same or a different node in the infrastructure [15].
cluster. The monitoring interval is a configurable parameter B. RUBAS design
and can be set by the cluster administrator. RUBAS uses
In Figure 3, we list the steps involved in executing a
Checkpoint Restore in Userspace (CRIU) to create checkpoints
job. An application is first submitted to RUBAS. RUBAS
and restore docker container execution, and therefore, does
creates the launch specification required for the job to execute
not require the application to be killed and restarted. RUBAS
on the Kubernetes cluster, which includes an unique-id to
migrates workloads to nodes/cluster that have the required
monitor the application’s resource usage. The specification
resources. Unlike Kubernetes, it does not assume that the
34
&RQWDLQHU
new limit whenever needed. The default formula used by the
recommender for new resources is the following:
& & &
$3,6HUYHU 6FKHGXOHU 3RG 3RG New Resources = max{peak + MinBumpUp, peak x BumpUpRatio}
&RQWUROOHU0DQDJHU 'RFNHU
where
0DVWHU
.XEHOHW 3UR[\
:RUNHU1RGH peak = max memory used in observed data,
HWFG MinBumpUp = 100MB & BumpUpRatio = 1.2
Fig. 2: General architecture of Kubernetes showing different While these equations are modifiable, this resource correction
components of a typical Kubernetes setup. approach itself has drawbacks. For example, the peak memory
usage could have been a temporary spike in the application,
is sent to Kubectl, which is a Kubernetes component for but VPA aims to allocate for that amount throughout the
managing the cluster. Kubectl forwards the job to Kubernetes application’s duration. Furthermore, the current VPA imple-
Master. Kubernetes Master schedules the application on the mentation restarts the pod every time the allocation is changed.
worker nodes. In step 5, the metrics server collects resource Such a disruptive process results in a loss of the state of
utilization details of each container running on the cluster. the containers. Though Docker released a Docker update, to
In step 6, RUBAS checks the utilization of each container allow updatation of the resources allocated to the containers, it
against the allocation. If the allocation and the utilization cannot be directly implemented with large distributed systems
matches, then RUBAS allows the execution to continue. If like Kubernetes. This is because there are various components
the allocation and utilization does not match, then in Step 7, such as scheduler, control manager and kubelet that depend
RUBAS sends two instructions to Docker – one to checkpoint on the original allocation to take decisions. Unless all these
the container and another to create an image that contains components are aligned, it is currently difficult to leverage the
the data generated by the container. Docker then creates these new Docker update feature for VPA.
images in step 8. In step 9, the image and checkpoint are
stored in NFS, so that they are accessible across the cluster. B. Rresource Estimation: RUBAS vs. VPA
In step 9a, the information of the created checkpoint and the • RUBAS has two key modules:
image is sent to RUBAS. In step 10, RUBAS creates the launch – A runtime profiling and estimation system. In earlier
specification with the checkpoint and the image and sends it work, we have shown the effectiveness of a similar
to kubectl. Kubectl forwards the job to Kubernetes master, runtime profiling system [16] [17].
which then schedules the job on the worker nodes. In step 13, – A vertical scaling system that can change the allocation
the checkpoint and the image is downloaded to the selected at runtime non-disruptively (if required).
machine and the container is restored. • RUBAS schedules all the applications as they arrive on
the cluster and starts collecting resource usage data at one
(&&OXVWHU second intervals.
6ZDUP • The profiling data collection involves CPU and memory
0DQDJHU
usage and is obtained via the Metrics Server. Based on
this data, a new estimation is generated every 60 seconds.
0HWULFV
58%$6 • Based on the data collected every 60 seconds, an estimate
6HUYHU
1)6 is generated using the following formula:
$SSOLFDWLRQ4XHXH
.XEHUQHWHV
.XEHFWO 1 N
0DVWHU 2
buffer = (xi − x)
N −1
:RUNHUQRGHV 58%$6FRPSRQHQWV i=1
Required Resource = Median Of Observations + buffer
Fig. 3: Architecture of RUBAS: workflow of profiling, check-
pointing and resuming applications in the cluster. In the formula, N denotes the total number of observa-
tions, xi denotes the ith observation and x denotes the
III. V ERTICAL AUTO - SCALING
arithmetic mean of the observations. The buffer is the
A. Discussion- Kubernetes VPA absolute deviation of the observed values.
The current implementation of VPA does not require the • As the applications are initially scheduled without any
user to provide any resource requests, for CPU and memory, resource allocation, we set the limits via these patching
for a pod. By default, the initial allocation in Kubernetes options:
is a configurable lower limit or the minimum guaranteed – Disruptive: The applications are restarted upon patch-
resources allocated for a pod. The VPA uses a recommender, ing. This is used when an application restart does not
which tracks the historical usage of the pod and suggests a affect the outcome.
35
– Non-Disruptive: The applications are checkpointed and to estimation error due to insufficient data points. At the 30
restarted using the checkpoint. This process may in- second interval, we see a reduction in the number of container
volve container migration. If the estimated amount migrations and the runtime improves. However, it is still higher
of resources are not available on the same node, than the runtime achieved when the optimal resources are
the application is migrated to another node that has allocated. At the 60 second interval, we observe the least
the availability. However, in case there are no nodes number of container migrations and the runtimes are very close
with the available resources, the application continues to the runtimes obtained with optimal resource allocations.
execution in its current state. The overhead due to the estimation and container migration
• The process of estimation and patching continues until on the benchmarks ranges from 5% to 20%. At 75 and 90
the application terminates. second intervals, we do not see any improvement for either
the number of container migrations or the runtime. We thus
IV. E VALUATION conclude that 60 seconds is ideal monitoring interval for the
A. Evaluation Setup given set of benchmarks.
1) Setup: The cluster setup for the experiments consisted Determing Best Monitoring Interval
of 30 Kubernetes worker nodes and 1 Kubernetes master node,

700
all in the AWS cloud. The details of the setup are presented in 8
4
dgemm
Blackscholes
600
Table I. Table II provides details about the benchmarks used 1
2
Canneal
Ferret
Time in Seconds
0 Fluidanimate
as workloads in the experiments. 500
4
Freqmine
Swaptions
400 4
1 Streamcluster
0 8
2
3
Equipment/OS/Software Description/Version 300 0
1
1 1
6
6
AWS EC2 Nodes (31) - Compute Optimized c5.2xlarge, 8 200

0
1
4 4
0 3
core processor at 2.3GHz, 16 GB 0
0
2
1
1 1
DDR4 RAM 100

0
Operating System Ubuntu 18.04.1 LTS 0

Optimal Resources 60 sec 30 sec 15 sec
Docker 17.06.1
Fig. 4: Determining the best monitoring interval. The number
Kubernetes 1.12.3
in each bar represents the migrations for allocations corre-
TABLE I: Description of the infrastructure used in the exper- sponding to different monitoring intervals.
iments on Amazon AWS EC2 Cloud cluster. C. Comparison with Kubernetes VPA
Workload Description In Figure 5, we provide a comparison with Kubernetes VPA
1. Blackscholes Computational financial analysis application for runtime and number of corrections. The presented results
2. Canneal Engineering application are average of 10 executions. We conducted experiments for
3. Ferret Similarity search application 3 different settings: (1) RUBAS without resource allocation
4. Fluidanimate Application consists of animation tasks restrictions; (2) Kubernetes VPA without resource allocation
5. Freqmine Data mining application restrictions; and (3) 30% resources beyond the optimal re-
6. Swaptions Financial Analysis application quirement, as determined by static profiling Kubernetes VPA.
7. Streamcluster Data mining application The results show that even with grossly incorrect allocations,
8. DGEMM Dense-matrix multiply benchmark RUBAS is able to estimate and correct the allocation in all
the applications. In comparison, Kubernetes VPA performs
TABLE II: Description of the benchmarks from PARSEC (1-7),
multiple restarts, which increases the overhead and in some
and DGEMM used in the experiments.
cases the runtime is more than twice that of RUBAS. However,
when the resource allocation is closer to the optimal resources,
B. Determining the best monitoring interval
Kubernetes VPA and RUBAS have similar performance. In the
The monitoring interval has a significant effect on the case of Blackscholes and Canneal, Kubernetes VPA peforms
resource estimation and needs to be determined experimentally better than RUBAS. It is to be noted that for Kubernetes to
for each class of applications. We conducted experiments for outperform RUBAS, the user needs to know in advance the
monitoring intervals ranging from 15 seconds to 90 seconds. near optimal resource requirement for each application. When
Figure 4 presents the results at intervals 15, 30 and 60 seconds such information is not available, RUBAS is a better choice
along with the runtime. In these experiments we initially allo- compared to Kubernetes VPA.
cated resources based on static profiling. The number in each
bar represents the number of container migrations that were D. Comparison of allocation, RUBAS and Kubernetes VPA
performed before the completion of execution. We observe In Figure 6, we show the allocations made by RUBAS
that at 15 seconds, there are several container migrations and Kubernetes VPA at various stages of execution for the
that results in high overhead. For example, DGEMM has 8 DGEMM application. Kubernetes VPA takes 4 corrections for
container migrations and the runtime is almost 5 times the the DGEMM benchmark to reach the optimal CPU allocation
runtime for DGEMM when it is run with optimal resources as whereas RUBAS takes only 2 corrections. For memory allo-
determined by static profiling. The overhead can be attributed cations, Kubernetes VPA takes 3 corrections while RUBAS
36
Comparison with Kubernetes VPA node. We used the nodeSelector option in the Kubernetes
800
700
4 PodSpec to control in-place vs remote node migration. By
600
6 pinning the pod to a particular node we were able to perform
Time in Seconds
500
1 1
in place migrations, which resulted in runtime gains. We
5
400
4 5 5 1 1 notice a 4.1% decrease in runtime for DGEMM, 2.09% for
1 1 1
300
1 1 Blackscholes, 2.61% for Canneal, 5.62% for Ferret, 6.89%
200 1 1 1 for Fluidanimate, 5.73% for Freqmine, 4.58% for Swaptions,
2
2 1
1 1
100 1
and 2.13% for Streamcluster.
0
Dg Bl Ca Fe Fl Fr Sw St
em ac nn rr ui eq ap re
m ks ea et da mi ti am Migration Vs Restart
ch l ni ne on clu
ol ma s st
es te er
Migration Restart
RUMIG Kubernetes VPA Kubernetes VPA (optimal+30%) 600
500
Fig. 5: RUBAS vs. Kubernetes VPA: runtime and number of
Time in Seconds
corrections. 400
300
takes 2 corrections. As RUBAS has more accurate resource 200
estimates it performs fewer corrections. Due to the inclusion 100
of container migration, the runtime is lower by 63% in the case 0

DG Bl C F F F S S DG B C F F F S S
EM ac ann err lui req wap tre EM lac ann err lui req wap tre
of RUBAS. Kubernetes VPA restarts after every correction, M ks e
ch al
ol
es
et da m t a
ni ine ion mcl
ma
te
s us
te
r
M ks e
ch al
ol
es
et da m t a
ni ine ion mcl
ma
te
s us
te
r
whereas RUBAS resumes from the checkpoint. Fig. 7: Migration vs Restart with interval size of 60 seconds.
RUBAS vs Kubernetes VPA - DGEMM Allocation
CPU Allocation (cores)
CPU Allocation
8
Migration Comparison
6
Remote Node Scaling In Place Scaling
4
500
2
Time in Seconds
0 100 200 300 400 400
Time (seconds)
300
Memory Allocation
Memory Usage (MB)
600 200
400 100
200 0
DG Bl C F F F S S DG B C F F F S S
EM ac ann err lui req wap tre EM lac ann err lui req wap tre
M ks e et da m t a M ks e et da m t a
ch al ni ine ion mcl ch al ni ine ion mcl
ol ma s us ol ma s us
0 es te te es te te
r r
0 100 200 300 400
Time (seconds)
Fig. 8: In-place scaling vs remote node scaling.
RUBAS Kubernetes VPA
Optimal CPU Optimal Memory F. Cluster utilization
Fig. 6: CPU and memory allocation with estimation for
1) Setup: We ran a total of 10 iterations, where each itera-
DGEMM benchmark in RUBAS and Kubernetes VPA setups.
tion consisted of 135 instances of each benchmark presented
E. Disruptive vs Non-Disruptive Vertical Scaling in Table II, i.e. a total of 1080 jobs. The experiment was
In Figure 7, we show the difference in runtime while using executed on a cluster of 30 c5.2xlarge nodes as described
the RUBAS estimation methodology. For the experiment we in Table I. Apart from the execution process explained in
used 60 second monitoring interval as explained in Section the section II-B, the cluster experiments had some additional
IV-B. In all cases, we see an improvement in runtime when conditions for migration. If the estimated resource requirement
using migrations. For DGEMM, we see a 49.10% decrease was more than the available resources in the cluster, then the
in runtime, Blackscholes has 23.65% decrease, Canneal has application was allowed to continue execution, as it is better
23.5%, Ferret has 11.97%, Fluidanimate has 3.33%, Freqmine to continue execution rather than wait in the queue till the
has 7.81%, Swaptions has 13.49%, and Streamcluster has cluster has available resources. In the meanwhile, when the
4.91% decrease in runtime. As the size of the of the migration estimated resources was generated, and if it was determined
image increases, the runtime gain decreases as can be seen that the application needs to be scaled, then a checkpoint was
with applications such as Fluidanimate, Freqmine and Stream- created regardless of the resource availability. The idea was to
cluster where the gains are less than 10%. On the other hand, address situations when an application fails due to incorrect
the benefits of migration are more pronounced when the initial resource allocation while it is allowed to continue execution.
estimations are either incorrect or the application changes its 2) Performance (runtime): In Figure 9, we show the av-
requirement. For example, in DGEMM, the application slowly erage runtime of default Kubernetes, Kubernetes VPA and
increases its usage and this leads to incorrect estimations, RUBAS. All three setups were provided with the same set
which then causes multiple migrations/restarts. of applications with the same resource requests, i.e. optimal
In Figure 8, we compare the migration costs of RUBAS for + 30% resources. In our experiments, we see that default
in-place scaling on the same node versus scaling on another Kubernetes, with no ability to modify the resources, takes
37
Cluster CPU Allocation vs Usage Percentage
much longer to complete. On average, default Kubernetes took
32162 seconds. In comparison, Kubernetes VPA has a much
better runtime of 23127 seconds on average. This accounts 80
to about 28% decrease in runtime. Kubernetes with VPA

60
performed much better than the default Kubernetes even by
restarting the pods. Restarting the pods should have been 40
a detrimental step but because VPA is able to correct the 20
allocation and make resources available, more applications

0
are launched by Kubernetes resulting in lower runtime. Our Kubernetes Kubernetes-VPA RUBAS
Average Cluster CPU Allocation Average Cluster CPU Usage

approach, RUBAS, achieves improvement by avoiding the
overhead of restarts by incorporating container migration. Fig. 10: CPU utilization results for a cluster with 30 nodes,
We see that with RUBAS the runtime further reduces to running 1080 jobs to completion.
Cluster Memory Allocation vs Usage Percentage
19457 seconds. This accounts to about 16% improvement over
Kubernetes VPA and about 40% improvement over default
Kubernetes. 80
60
Cluster Runtime results
40
RUBAS 20
0
Kubernetes Kubernetes-VPA RUBAS
Kubernetes VPA Average Cluster memory Allocation Average Cluster memory Usage
Fig. 11: Memory utilization results for a cluster with 30 nodes,

Default Kubernetes running 1080 jobs to completion.
0 5k 10k 15k 20k 25k 30k 35k
G. Effects of Migration- Discussion
Time in Seconds
1) Ability to vertically scale for ad-hoc jobs: Vertical

Fig. 9: Runtime results for a cluster with 30 nodes, running scaling of service jobs does not require migration, as the
1080 jobs to completion. services generally do not need to maintain the state after
restart. Also, service jobs generally have replicated jobs that
3) CPU utilization: In Figure 10, we show the CPU utiliza- help with load balancing. So a restart for these jobs is not very
tion corresponding to the allocation. In default Kubernetes, we critical. However, ad-hoc or one-time jobs need to maintain
see that only about 47% of the allocated CPU was used by the the state of execution or else they will have to start execution
applications. This means that more than half of the allocated again from the beginning. We have shown that RUBAS can
CPU was idle. As there are several application waiting in the migrate ad-hoc jobs and continue the execution while being
queue, the under utilization of resources effectively increases able to vertically scale them.
the overall runtime. Kubernetes with VPA was able to correct 2) Ripple Effect of Migrations: Ripple effect happens when
the allocation and thus had a much better utilization at about the container keeps migrating from one machine to another
72%. However, Kubernetes VPA has a drawback in the way the without any actual work being done. This happens when
applications are restarted. Instead of restarting the application the time frame for estimation is small and the estimation is
immediately, we observed that Kubernetes puts the task in inaccurate. We see such an example for the 15 second interval
the queue, which delays the execution of the application. In in Figure 12, which has 8 migrations and the runtime is twice
situations where the application is providing a service, there the runtime of 60 second interval.
could be a long gap till the application is restarted. We address 3) Migration is Not Always the Best Choice: There are
this issue in RUBAS. The application that needs to be resumed some situations where container migration is not the best
has the highest priority in the RUBAS queue. With RUBAS, choice. For state-less applications, it not required to maintain
using container migration, we see even better CPU utilization state. So, instead of migration, restart of the application
at about 82%. with the appropriate resources is a better choice. Most state-
4) Memory utilization: In Figure 11, we show the average less applications are often replicated for the purpose of load
memory utilization of the cluster with default Kubernetes, balancing. If one of the replicated applications is restarted,
Kubernetes with VPA, and RUBAS. The trends are similar to there will be a momentary degradation of the service until a
the CPU utilization data presented in Section IV-F3. Default new replica takes over.
Kubernetes had an average memory utilization of 43% when 4) Stability with Multiple Container Migrations: In our
the allocation was 100%. Kubernetes VPA improved the experiments we observed that when an application is migrated
utilization to 76% as it was able to re-size the applications. more than 4 times, then the migration image is sometimes
RUBAS further improved the utilization to 86%. generated incorrectly. This is a current limitation of CRIU.
38
Effect on CPU utilization
due to incorrect profiling interval size
target node. This could be expensive if the application is
8000
15 Second Interval relaunched on a different node. We note this behaviour in
6000
Figure 8 for Ferret, Fluidanimate, and Freqmine where the
4000
transfer time is about 15-20 seconds. This is due to the
2000 size of the checkpointed Docker image and the network
0 bandwidth between the two nodes.
0 100 200 300
• Resuming from the checkpoint: CRIU enables resumption
Time (seconds)
CPU Allocation (1000 units = 1 core)
of the container on the target node. CRIU uses the newly

30 Second Interval created Docker image from the checkpoint instead of the
8000
6000
original Docker image.
4000
Benchmarks Size Chk Image Transfer Resume
2000 (MB) Creation creation
0 1. Blackscholes 74.5 <1 1 1 <1
0 50 100 150 200
2. Canneal 63.36 1 1 1 <1
Time (seconds)
3. Ferret 636.6 2 13 16 <1
60 Second Interval 4. Fluidanimate 733.26 2 12 17 <1
8000
5. Freqmine 820.14 3 13 20 <1
6000
6. Swaptions 236.3 2 6 7 <1
4000
7. Streamcluster 412.6 2 7 8 <1
2000
8. DGEMM 101.2 1 3 1 <1
0
0 50 100 150
TABLE III: Average downtime of each benchmark at different
Time (seconds)
stages (Unit: second).
CPU Usage CPU Allocation
V. R ELATED W ORK
Fig. 12: Effects of incorrect estimations caused by smaller
monitoring interval. Each drop in CPU utilization is due to The area of container migration is still in its infancy, but
migration (8 in total for 15 second interval). The benchmark virtual machine migration is a well studied topic. There are
used is DGEMM and monitoring interval is 15, 30, and 60 many projects on scheduling and VM migration in the cloud
seconds. with focus on the effect bin packing algorithms cloud [18],
energy aware scheduling, and migration prediction for virtual
5) Incorrect estimations could lead to higher execution machine migration in the cloud [19].
time: In applications that generate a lot of data, the migrations The pMapper [20] project studied continuous dynamic allo-
times are high as the generated data needs to packaged in a cation and optimization on virtual machines(VM). They have
Docker image and then copied to the target machine. This a similar approach as RUBAS, but in the context of virtual
effect is quantified in Figure 8, where we observe considerable machines.
increase in runtime for Ferret, Fluidanimate and Freqmine There are many projects that purely concentrate on vertical
applications due to migration. In Figure 12, at 15 second scaling in the context of VMs on various cloud platforms and
interval we have a runtime of over 300 seconds for DGEMM, explore different aspects of vertical scaling such as application
while at 60 second interval it is less than half. The smaller driven vertical scaling, vertical scaling across federated clouds,
monitoring interval is causing to incorrect estimations and that and on specific platforms such as OpenStack [21], [22]. Han
leads to higher runtime. et. al. [23] presented a lightweight resource allocation system
6) Migration Downtime: Container migration involves mul- that uses the resource utilization of applications running in the
tiple steps, which are time consuming. These steps include: VM to inform scheduling decision for VM allocation.
• Checkpoint Creation: This process is taken care of by CRIU.
Other projects such as [24], [25] show the effects of
It creates a checkpoint of the current state of the Docker using applications usage to aid VM scheduling decisions. The
container. However, it does not checkpoint any changes that positive effects of dynamic vertical and horizontal scaling of
the Docker image has made on the storage. virtual machines and containers has also been studied [26].
• Creating checkpointed Docker image: As CRIU does not
Our work involves enabling vertical scaling for stateful appli-
create a checkpoint for the data generated by the container, cations i.e. vertical scaling for containers where continuation
RUBAS needs to handle it separately. RUBAS creates a new of execution is important. DoCloud [27], presents a proactive
Docker image from the running container and copies all the method of performing vertical scaling of Docker containers
data that is generated by the container. This process is time that run web applications. They pro-actively vary the resource
consuming and the overhead depends on the generated size allocation for containers to attain higher utilization. Hadley
of the data. et. al. [28] have shown the capability of CRIU (Checkpoint
• Transferring checkpoint and checkpointed Docker image to
Restore In Userspace) across multiple clouds. They show this
the target machine: Once the application is re-scheduled to capability for both stateful and stateless applications. However,
be launched on a node by Kubernetes, we need to transfer we are not aware of any work that combines container mi-
the checkpoint and the checkpointed Docker image on the gration capability for stateful applications along with vertical
39
scaling. This paper demonstrates this new capability and [11] E. Cortez, A. Bonde, A. Muzio, M. Russinovich, M. Fontoura, and
R. Bianchini, “Resource Central: Understanding and Predicting Work-
explores its efficacy when used in a Kubernetes environment. loads for Improved Resource Management in Large Cloud Platforms
*,” 2017. [Online]. Available: https://doi.org/10.1145/3132747.3132772
VI. C ONCLUSION https://doi.org/10.1145/3132747.
[12] “Azure Public Dataset.” [Online]. Available:
RUBAS provides an effective correction for resource allo- https://github.com/Azure/AzurePublicDataset
[13] “Kubernetes Vertical Pod Autoscaler.” [Online]. Avail-
cation for applications running inside a Kubernetes cluster. able: https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-
We compare RUBAS with the current state of art, Kubernetes autoscaler
Vertical Pod Autoscaler (VPA), which also aims at providing [14] A. Wong, D. Rexachs, and E. Luque, “Parallel Application Signature
for Performance Analysis and Prediction,” IEEE Transactions on
accurate resource allocation. RUBAS provides many benefits Parallel and Distributed Systems, vol. 26, no. 7, pp. 2009–2019, 7
over Kubernetes VPA by providing a non-disruptive method of 2015. [Online]. Available: http://ieeexplore.ieee.org/document/6827943/
auto-scaling, a more accurate estimation methodology which [15] “Reference - Kubernetes.” [Online]. Available:
https://kubernetes.io/docs/reference/
together improve the cluster CPU and memory utilization [16] G. Rattihalli, P. Saha, M. Govindaraju, and D. Tiwari, “Two stage
by 10%, and reduced runtime by 15%. We see an overhead cluster for resource optimization with Apache Mesos,” 5 2019. [Online].
for each application runtime in the range of 5% to 20%, Available: http://arxiv.org/abs/1905.09166
[17] G. Rattihalli, “Exploring Potential for Resource Request Right-Sizing
but the applications have near accurate allocations which via Estimation and Container Migration in Apache Mesos,” in 2018
allows the cluster to be utilized by other applications. RUBAS IEEE/ACM International Conference on Utility and Cloud Computing
also decreases the number of restarts over Kubernetes VPA. Companion (UCC Companion). IEEE, 12 2018, pp. 59–64. [Online].
Available: https://ieeexplore.ieee.org/document/8605758/
We also, quantify the overhead of container migration in [18] Z. Zhang, L. Xiao, X. Chen, and J. Peng, “A Scheduling Method for
RUBAS at various stages of migration, conduct experiments Multiple Virtual Machines Migration in Cloud.” Springer, Berlin,
to determine the best interval size for monitoring the container Heidelberg, 2013, pp. 130–142.
[19] V. Kherbache, E. Madelaine, and F. Hermenier, “Scheduling
for the purpose of estimations, compare the allocation pattern Live Migration of Virtual Machines,” IEEE Transactions
of RUBAS with Kubernetes VPA, and evaluate the overhead of on Cloud Computing, p. 1, 2017. [Online]. Available:
In-Place scaling and remote node scaling. Overall, even with http://ieeexplore.ieee.org/document/8046068/
[20] A. Verma, P. Ahuja, and A. Neogi, “pMapper: Power and Migration
grosly inaccurate allocations, RUBAS is able to correct the Cost Aware Application Placement in Virtualized Systems.” Springer,
allocations with much less number of corrections than Kuber- Berlin, Heidelberg, 2008, pp. 243–264.
netes VPA while providing much improved cluster utilization. [21] L. Lu, X. Zhu, R. Griffith, P. Padala, A. Parikh, P. Shah, and E. Smirni,
“Application-driven dynamic vertical scaling of virtual machines in
resource pools,” in 2014 IEEE Network Operations and Management
ACKNOWLEDGMENT Symposium (NOMS). IEEE, 2014, pp. 1–9. [Online]. Available:
This research is partially supported by the National Science http://ieeexplore.ieee.org/document/6838238/
[22] M. Turowski and A. Lenk, “Vertical Scaling Capability of
Foundation under Grant No. OAC-1740263. OpenStack.” Springer, Cham, 2015, pp. 351–362. [Online]. Avail-
able: https://www.springerprofessional.de/vertical-scaling-capability-of-
R EFERENCES openstack/2520776
[23] R. Han, L. Guo, M. M. Ghanem, and Y. Guo, “Lightweight
[1] “Cloud Native Computing Foundation.” [Online]. Available:
Resource Scaling for Cloud Applications,” in 2012 12th IEEE/ACM
https://www.cncf.io/
International Symposium on Cluster, Cloud and Grid Computing
[2] W. Lin, J. Z. Wang, C. Liang, and D. Qi, “A Threshold-based
(ccgrid 2012). IEEE, 2012, pp. 644–651. [Online]. Available:
Dynamic Resource Allocation Scheme for Cloud Computing,” Procedia
http://ieeexplore.ieee.org/document/6217477/
Engineering, vol. 23, pp. 695–703, 1 2011. [Online]. Available:
[24] D. A. Bacigalupo, J. van Hemert, X. Chen, A. Usmani,
https://www.sciencedirect.com/science/article/pii/S1877705811054117
A. P. Chester, L. He, D. N. Dillenberger, G. B. Wills,
[3] F. Chang, J. Ren, and R. Viswanathan, “Optimal Resource Allocation
L. Gilbert, and S. A. Jarvis, “Managing dynamic enterprise
in Clouds,” in 2010 IEEE 3rd International Conference on Cloud
and urgent workloads on clouds using layered queuing and
Computing. IEEE, 7 2010, pp. 418–425. [Online]. Available:
historical performance models,” Simulation Modelling Practice and
http://ieeexplore.ieee.org/document/5557966/
Theory, vol. 19, no. 6, pp. 1479–1495, 2011. [Online]. Available:
[4] V. K. Vavilapalli, S. Seth, B. Saha, C. Curino, O. O’Malley, S. Radia,
https://www.sciencedirect.com/science/article/pii/S1569190X11000153
B. Reed, E. Baldeschwieler, A. C. Murthy, C. Douglas, S. Agarwal,
[25] J. Z. Li, J. Chinneck, M. Woodside, and M. Litoiu, “Fast scalable
M. Konar, R. Evans, T. Graves, J. Lowe, and H. Shah, “Apache
optimization to configure service systems having cost and quality
Hadoop YARN,” in Proceedings of the 4th annual Symposium on
of service constraints,” in Proceedings of the 6th international
Cloud Computing - SOCC ’13. ACM Press, 2013, pp. 1–16. [Online].
conference on Autonomic computing - ICAC ’09. New York,
Available: http://dl.acm.org/citation.cfm?doid=2523616.2523633
New York, USA: ACM Press, 2009, p. 159. [Online]. Available:
[5] M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek, and J. Wilkes,
http://portal.acm.org/citation.cfm?doid=1555228.1555268
“Omega,” in Proceedings of the 8th ACM European Conference on
[26] P. Hoenisch, I. Weber, S. Schulte, L. Zhu, and A. Fekete, “Four-Fold
Computer Systems - EuroSys ’13. ACM Press, 2013, p. 351. [Online].
Auto-Scaling on a Contemporary Deployment Platform Using Docker
Available: http://dl.acm.org/citation.cfm?doid=2465351.2465386
Containers.” Springer, Berlin, Heidelberg, 2015, pp. 316–323.
[6] “Apache Aurora.” [Online]. Available: http://aurora.apache.org/
[27] C. Kan, “DoCloud: An elastic cloud platform for Web applications
[7] “Marathon: A container orchestration platform for Mesos and DC/OS.”
based on Docker,” in 2016 18th International Conference on Advanced
[Online]. Available: https://mesosphere.github.io/marathon/
Communication Technology (ICACT). IEEE, 2016, p. 1. [Online].
[8] “TORQUE Resource Manager.” [Online]. Available:
Available: http://ieeexplore.ieee.org/document/7423439/
http://www.adaptivecomputing.com/products/open-source/torque/
[28] J. Hadley, Y. Elkhatib, G. Blair, and U. Roedig, “MultiBox:
[9] C. Delimitrou and C. Kozyrakis, “Quasar: Resource-Efficient
Lightweight Containers for Vendor-Independent Multi-cloud Deploy-
and QoS-Aware Cluster Management.” [Online]. Available:
ments.” Springer, Cham, 2015, pp. 79–90.
http://dx.doi.org/10.1145/http://dx.doi.org/10.1145/2541940.2541941
[10] K. Kambatla, A. Pathak, and H. Pucha, “Towards optimizing hadoop
provisioning in the cloud,” in Proceedings of the 2009 conference on
Hot topics in cloud computing. USENIX Association, 2009, p. 22.
40

Exploring Potential For Non-Disruptive Vertical Auto Scaling and Resource Estimation in Kubernetes

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Exploring Potential For Non-Disruptive Vertical Auto Scaling and Resource Estimation in Kubernetes

Uploaded by

Copyright:

Available Formats

2019 IEEE 12th International Conference on Cloud Computing (CLOUD)

Exploring Potential for Non-Disruptive Vertical

2159-6190/19/$31.00 ©2019 IEEE 33

of 30 Kubernetes worker nodes and 1 Kubernetes master node,

AWS EC2 Nodes (31) - Compute Optimized c5.2xlarge, 8 200

DDR4 RAM 100

Operating System Ubuntu 18.04.1 LTS 0

takes 2 corrections. As RUBAS has more accurate resource 200

estimates it performs fewer corrections. Due to the inclusion 100

of container migration, the runtime is lower by 63% in the case 0

0 100 200 300 400 400

to about 28% decrease in runtime. Kubernetes with VPA

a detrimental step but because VPA is able to correct the 20

allocation and make resources available, more applications

Average Cluster CPU Allocation Average Cluster CPU Usage

Fig. 11: Memory utilization results for a cluster with 30 nodes,

1) Ability to vertically scale for ad-hoc jobs: Vertical

of the container on the target node. CRIU uses the newly

You might also like