Adaptive Containerization For Microservices in Distributed Cloud Systems

,(((WK$QQXDO&RQVXPHU&RPPXQLFDWLRQV
1HWZRUNLQJ&RQIHUHQFH&&1&
Adaptive Containerization for Microservices in

Distributed Cloud Systems
Nishant Deepak Keni and Ahan Kak
School of Electrical and Computer Engineering,
Georgia Institute of Technology, Atlanta, GA 30332, USA
Email: nishant.keni@gatech.edu; ahan@ece.gatech.edu
Abstract—The traditional monolithic on-premises model of virtualization can be achieved primarily through the use of
application deployment is fast being replaced by a cloud-based either hypervisors or containers [3].
microservices paradigm, driven in part by the rise of numerous Hypervisor-based virtualization allows for the provisioning
cloud infrastructure providers providing seamless access to a
variety of computing hardware, and the need for applications to of multiple isolated virtual machines (VMs) over the same
serve an ever-increasing audience necessitating scalability. While physical hardware [4, §2.3.3]. This approach allows each VM
container-based virtualization has been the preferred method of instance to have its own unique operating system (OS) with
microservice deployment, Cloud Consumers have not had much its own binaries, libraries, and applications. Hypervisors can
opportunity for cost and resource optimization thus far. To this further be categorized into: (i) Type 1 hypervisors that operate
end, this paper introduces a resource allocation framework for
the containerized deployment of microservices, called Adaptive on top of the host’s hardware, and (ii) Type 2 hypervisors that
Containerization for Microservices in Distributed Cloud Systems, operate on top of the host’s OS. The principal drawback of
which helps reduce operating costs while ensuring a minimum the hypervisor-based approach is the high overhead associated
guaranteed level of service. Further, a variety of performance with maintaining a full OS within each VM instance.
evaluation metrics have been provided to reinforce the validity On the other hand, containerization [5] is a form of vir-
of the proposed framework.
Index Terms—microservices architecture, containerization,
tualization that attempts to achieve resource isolation with
cloud-native, adaptive resource allocation minimal overhead by sharing the kernel with the host OS. The
use of containerization allows us to implement many of the
desirable features associated with cloud platforms– elasticity,
I. I NTRODUCTION reliability, and ease of management. Therefore, we focus
In recent years there has been a paradigm shift in software on container-based virtualization in this paper. Microservices
architecture design from an on-premises model to a cloud- can be conveniently packaged into containers that are then
native approach [1]. This change has been largely driven deployed onto physical hardware, thus ensuring a consistent
by the demand for greater system reliability, scalability, and software execution environment from the developer to the
flexibility. However, traditional applications are monolithic in consumer. The use of containerization also allows for inherent
nature, i.e., built as a single unit, primarily consisting of a scalability and creates a redundancy mechanism for machine
database, a client-side interface, and a server-side application. failure as container instances can be added and removed on
This monolithic architecture suffers from a number of draw- demand.
backs: While the decomposition of a monolithic application into
a set of microservices is the prerogative of the application
• System updates are cumbersome as the developer must developer, the deployment of microservices onto physical
re-deploy the entire server-side application each time. hardware is a run-time exercise. Traditionally, microservices
• Failure diagnosis is challenging since the monolithic have been deployed on single-vendor clouds in accordance
application operates as a single unit. with their resource requirements without much regard for
• Scalability is difficult to achieve leading to resource either resource or cost optimization. However, the emergence
wastage. of Cloud Brokers [6], [7] has provided consumers with the
Consequently, a monolithic application architecture cannot opportunity to optimize their usage of cloud infrastructure. The
leverage the benefits of a cloud-native approach. To this end, Cloud Broker serves as an intermediary between the Cloud
the microservices architecture [2] has emerged as a suitable Consumers who wish to deploy applications, and the Cloud
candidate for cloud-based software deployment. The microser- Providers who own the underlying infrastructure, as described
vices approach is based on the decomposition of complex in Section II.
software systems into multiple independent services, each with Accordingly, there have been a number of attempts at
its own system logic and data store. Microservices can be performing optimal deployment of microservices over cloud
deployed independent of each other and communicate either infrastructure [8]–[13]. More specifically, the work presented
through standardized interfaces or event messages, making in [8] proposes a dynamic CPU resource allocation framework
them amenable to deployment through virtualization. Resource for Docker containers that aims to reduce resource over-
k,(((
,(((WK$QQXDO&RQVXPHU&RPPXQLFDWLRQV 1HWZRUNLQJ&RQIHUHQFH&&1&
utilization. However, while the presented framework imple-

&ORXG$UFKLWHFWXUH
ments adaptive control, it is restricted to the management
of computing resources only, and does not consider other &ORXG&RQVXPHU &ORXG$XGLWRU
vital metrics associated with service-level agreement (SLA)
requirements such as those related to latency or reliability. In a
similar vein, the framework introduced in [9] presents a linear
programming based approach to microservice deployment
using Docker. However, here too, the presented model does
&ORXG%URNHU &ORXG3URYLGHU
not take into account service reliability. Within the context of
this paper, we define reliability as a measure of the hardware
infrastructure failure rate. Fig. 1. The cloud architecture under consideration [14].
Further, the authors in [10] have presented a genetic al-
gorithm for optimizing container allocation in the cloud,
however, the model presented in [10] does not take into ac- presents a comprehensive performance evaluation based on
count latency considerations associated with service delivery. metrics relating to resource utilization, costs, and reliability,
Latency constraints are essential for ensuring timely delivery while Section IV concludes the paper.
of services. In addition, we note that the large convergence
time of genetic algorithms is undesirable in the context of II. S YSTEM M ODEL
run-time resource allocation. At the same time, [11] can be In this section, first, we provide the necessary background
regarded as one of the more comprehensive works in this on the cloud architecture under which ACMDC operates,
domain, presenting both a container as well as a VM model, along with the primary stakeholders in our system. Then, we
along with limited support for quality of service (QoS) metrics. provide the detailed problem formulation that mathematically
On the other hand, the SmartVM platform introduced in [12] characterizes the entities under consideration along with the
puts forth a two-tier classification of microservices based on objective and constraints. Finally, we present our approach to
their functional logic and performs dynamic container resource solving the resource allocation problem that has been posed.
management in a manner that satisfies the applications’ QoS
requirements. However, we note that [12] does not discuss the A. Preliminaries
mapping of containers to the underlying physical substrate.
ACMDC operates within the realm of the general cloud
In [13], S. Chen et al. introduce QoS-aware resource partition-
computing architecture [14] shown in Fig. 1. The four primary
ing for microservices, allowing latency-critical microservices
stakeholders within the cloud domain are the:
to share the same physical node while still meeting their QoS
• Cloud Consumer: the entity that seeks to deploy applica-
requirements. While the presented work is pioneering in terms
of resource allocation within a single physical machine, it tions over cloud infrastructure owned by Cloud Providers,
• Cloud Provider: the entity that owns the hardware
does not delve into resource allocation across a cluster of such
machines. We note that within the broader domain of cloud- infrastructure used to deploy services,
• Cloud Broker: the entity that negotiates the relationship
based infrastructure, the resource allocation framework should
be scalable across multiple distributed machines. between Cloud Consumers and Cloud Providers, and
• Cloud Auditor: the entity that assesses and benchmarks
To this end, we note the following drawbacks in the prior
art: (i) absence of service reliability constraints, (ii) absence the performance of Cloud Providers.
of latency constraints, and (iii) lack of consideration for In what is known as the Infrastructure-as-a-Service (IaaS)
distributed cloud infrastructure. With a view to overcome model, the Cloud Consumer contracts with the Cloud Broker
the aforementioned shortcomings, in this paper, we introduce to deploy their applications in the cloud. The Cloud Broker
a resource allocation framework for Cloud Brokers called typically leases cloud infrastructure from Cloud Providers in a
Adaptive Containerization for Microservices in Distributed wholesale manner. Therefore, the Cloud Broker is responsible
Cloud Systems (ACMDC) which helps reduce operating costs for the containerization of each application’s constituent mi-
while ensuring a minimum guaranteed level of service, i.e., croservices, and for the deployment of these containers onto
the SLA. It is anticipated that through the use of ACMDC, the leased infrastructure.
Cloud Brokers will able to streamline resource usage and costs,
in turn providing Cloud Consumers with a greater degree of B. Problem Formulation
flexibility in their choice of cloud infrastructure. As shown in Fig. 2, the physical hardware substrate rep-
To the best of our knowledge, this work is the first to present resents the physical machines leased by the Cloud Broker. A
an SLA-based resource allocation framework for containerized physical machine p is characterized by its computing resource
microservices that takes into account both service latency capacity (r(p)), cost per unit of computing resource (c(p)) and
as well as reliability. The rest of this paper is organized as operational cost (m(p)) incurred by the broker, link latency
follows. Section II introduces the system model, the problem to the broker (l(p)), and probability of failure (f (p)). In other
formulation, and our approach to the solution. Section III words, a physical machine p is represented by the tuple {r(p),
&ORXG&RQVXPHU
$SSOLFDWLRQV

0LFURVHUYLFHV
&RQWDLQHUL]DWLRQ
&ORXG%URNHU
&RQWDLQHUV
&RQWDLQHU'HSOR\PHQW
&ORXG3URYLGHU
3K\VLFDO+DUGZDUH6XEVWUDWH
Fig. 2. Containerized approach to microservices.
c(p), m(p), l(p), f (p)}. Further, a binary variable o(p) is used deadline T (a, u) is given by
to indicate the operational status of machine p, i.e,
x(p, a, u)t(a, u)
r(p, a, u) = ∀ p ∈ P, a ∈ A, u ∈ U, (2)
1, if at least one container is deployed on p; T (a, u)
op = (1)
0, otherwise.
where x(p, a, u) is the number of containers deployed on
The set of physical machines, P , serves as the hardware machine p of microservice u that serve application a. Since a
substrate for container deployment. Each cloud application a physical machine can only support a limited number of con-
is composed of a set of loosely coupled microservices which tainers, we enforce a machine capacity constraint as follows
perform a specific set of tasks, i.e., the microservices that form
the application are all independent of each other, and exchange r(p, a, u) ≤ r(p) ∀ p ∈ P. (3)
messages through a message broker deployed at the Cloud a∈A u∈U
Broker’s premises. Further, to ensure that the total number of containers does not
In decomposing an application into a set of microservices, exceed the total number of requests, we have
we introduce the following quantities, w(a, u) which denotes
the number of requests that microservice u must fulfill for x(p, a, u) = w(a, u) ∀ a ∈ A, u ∈ U, (4)
application a, t(a, u) which denotes the time required by p∈P
microservice u per unit computing resource to fulfill a request
and the relationship between the indicator variable o(p) and
for application a, and T (a, u) which denotes the execution
decision variable x(p, a, u) is given by
deadline for all requests served by microservice u for appli-

cation a. Further, as part of the SLA requirement, a Cloud x(p, a, u) ≤ M o(p) ∀ p ∈ P, (5)
Consumer imposes a service latency requirement, S(a, u), on a∈A u∈U
the Cloud Broker, for every microservice u that serves applica-
tion a, along with a cost penalty P (a, u) that the broker must where M is a large positive real number. As mentioned
bear for every lost request of application a that is served by previously, the framework presented herein also takes into
microservice u. Finally, we note that containerization is done account the SLA between the Cloud Consumer and the Cloud
on a per-request basis, i.e., for every request a new container Broker, accordingly, in order to meet the service latency
is instantiated and deployed on the hardware substrate. This requirement, we introduce the following constraint
approach ensures maximum resiliency and allows for quick
T (a, u) + x(p, a, u)l(p)
scaling of resources.
p∈P
Next, the minimum amount of computing resource at phys-
ical machine p, r(p, a, u), required to meet the execution ≤ S(a, u) ∀ a ∈ A, u ∈ U. (6)
Equation (6) takes into account both the execution deadline Algorithm 1 Adaptive Containerization for Microservices in
as well as the time it takes to communicate the microservice Distributed Cloud Systems (ACMDC)
output back to the message broker. The second SLA metric 1: initialization: process existing event batch
we take into consideration is system reliability. Since the 2: while new event do
probability of machine failure, f (p) is a Bernoulli random 3: check event type
variable, the expected number of failed (or lost) requests at 4: if new application event then

machine p is given by a∈A u∈U x(p, a, u)f (p), and the 5: solve OP with migration disabled
expected value of the associated overall cost penalty can be 6: solve OP with migration enabled
expressed as 7: check difference in cost outlay
8: if cost difference > migration cost then
v(p) = x(p, a, u)P (a, u)f (p) ∀ p ∈ P. (7) 9: accept solution with migration enabled
a∈A u∈U 10: perform migration
On other hand, the operating cost and resource cost per 11: else
machine p are given as 12: accept solution with migration disabled
13: end if
u(p) = m(p)o(p) + r(p, a, u)c(p) ∀ p ∈ P. (8) 14: else
a∈A u∈U
15: check feasibility of existing container deployment
Finally, the optimization problem (OP) associated with 16: if existing deployment not feasible then
ACMDC can then be expressed as 17: solve OP with migration enabled
18: accept solution with migration enabled
(OP ) min (wu u(p) + vu v(p))
x(p,a,u), o(p) 19: perform migration
p∈P
20: else
s.t. (3) − (6), 21: solve OP with migration enabled
o(p) ∈ {0, 1} 22: check difference in cost between existing and
∀ p ∈ P, proposed deployment
23: if cost difference > migration cost then
x(p, a, u) ∈ Z+
24: accept solution with migration enabled
∀ p ∈ P, a ∈ A, u ∈ U, 25: perform migration
with wu + wv = 1, 0 ≤ wu ≤ 1, and 0 ≤ wv ≤ 1. 26: else
27: maintain existing deployment
C. System Operation 28: end if
ACMDC follows a reactive event-driven model which al- 29: end if
lows us to maintain high elasticity with low complexity. Ex- 30: end if
amples of such events include the arrival of a new application 31: end while
request at the Cloud Broker, a change in the parameters
of an existing application, and a change in the parameters
of the underlying hardware substrate. Accordingly, there are III. P ERFORMANCE E VALUATION
two major event classes– new application events and change
In this section, we evaluate the performance of the ACMDC
events. As shown in Algorithm 1, on receiving an event
framework by considering a real-world cloud deployment
notification, the Cloud Broker first checks the event type.
scenario and comparing the results obtained against the de-
For new application events, OP is solved twice, first under
ployment strategies Spread, Random, and BinPack used by one
the assumption that the existing applications cannot be mi-
of the most popular container platforms today, Docker [15].
grated, and a second time with the aforementioned assumption
More specifically, concerning these three strategies, we note
relaxed. Clearly, the second approach is optimal but it entails
the following:
a migration cost associated with moving containers across
physical machines. However, if the cost difference between • Spread: Under the Spread strategy, the physical machine
the two does not exceed the migration cost, the Cloud Broker with the least number of active containers is chosen for
utilizes the solution offered by the first approach and does container deployment.
not perform migration. On the other hand, for change events, • Random: The Random strategy does not seek to optimize
the Cloud Broker first checks the validity of the existing for any performance metric, instead, machines are chosen
container deployment, if the existing deployment is no longer at random for container deployment.
feasible, OP is solved with the new parameters, and the Cloud • BinPack: BinPack follows the classical bin packing ap-
Broker performs container migration. While if the existing proach [16, §18], wherein the system prioritizes physical
deployment is deemed feasible, migration is only done if the machines with the most number of existing containers.
cost difference exceeds the migration cost. In this manner, the The first performance metric we take into consideration
system is able to seamlessly adapt to different events. is the number of active machines vs. the total number of
Fig. 3. Number of active machines relative to the total available machines. Fig. 5. Normalized cost outlay relative to the number of requests.
Fig. 4. Number of active machines relative to the number of requests. Fig. 6. Number of requests relative to the machine failure probability.
physical machines as shown in Fig. 3. Fig. 3 characterizes number of active machines to serve a maximum of 10, 000
the system response to an increasing number of nodes in the requests. For example, while BinPack requires 8 machines
physical substrate. For a fixed set of 10, 000 requests, we to serve 10, 000 requests, ACMDC is able to serve the same
see that both the Spread and Random strategies use up an number of requests with 7 physical systems.
increasing number of machines which is sub-optimal from an In a similar vein, Fig. 5 characterizes the normalized cost
operating cost perspective, on the other hand, ACMDC and outlay associated with the containerization and deployment
BinPack both maintain a constant 10 active machines. This of microservices, and further demonstrates a major advantage
result can be attributed to the fact that 10 physical machines offered by ACMDC. The costs shown in this figure have been
are sufficient to meet the requirements of the 10, 000 requests obtained by increasing the number of microservice requests
under consideration while achieving minimal cost outlay. from 50 to 10, 000, and then normalizing the result by the
Fig. 4 represents the increase in the number of active highest cost value, which happens to be achieved by the Spread
machines as the total number of requests increases from 50 to strategy. From the figure we see that, as the number of requests
10, 000, for a substrate of 10 machines. As expected, Spread scales to 10, 000, ACMDC is able to achieve cost savings
makes use of all 10 machines, Random eventually converges of over 40% compared to Spread and Random, and nearly
to 10 machines too, as the volume of requests increases. 35% compared to BinPack. The results thus obtained further
Further, while both ACMDC and BinPack utilize fewer than 10 reinforce the superiority of ACMDC which achieves the lowest
machines, ACMDC edges out BinPack by requiring a fewer cost outlay.
Finally, system reliability has been characterized in Fig. 6, [10] C. Guerrero, I. Lera, and C. Juiz, “Genetic Algorithm for Multi-
which represents the failure probability of the requests. While Objective Optimization of Container Allocation in Cloud Architecture,”
Journal of Grid Computing, vol. 16, no. 1, pp. 113–135, Nov. 2017.
the service latency requirement is enforced as a constraint, [11] M. Nardelli, C. Hochreiner, and S. Schulte, “Elastic Provisioning of
system reliability forms a part of the objective. Ideally, we Virtual Machines for Container Deployment,” in Proceedings of the 8th
would like to place a majority of the containers on machines ACM/SPEC on International Conference on Performance Engineering
Companion - ICPE '17 Companion, 2017.
with a low probability of failure f (p), however the resource [12] T. Zheng, X. Zheng, Y. Zhang, Y. Deng, E. Dong, R. Zhang, and X. Liu,
cost c(p) and operating o(p) cost serve as impediments to this “SmartVM: a SLA-aware microservice deployment framework,” World
idealized deployment. In other words, since system reliability Wide Web, vol. 22, no. 1, pp. 275–293, May 2018.
[13] S. Chen, C. Delimitrou, and J. F. Martı́nez, “PARTIES,” in Proceedings
is characterized by a cost penalty and the problem objective of the Twenty-Fourth International Conference on Architectural Support
deals with minimizing the overall system cost, we are willing for Programming Languages and Operating Systems - ASPLOS '19,
to accept a slightly less reliable system in exchange for a 2019.
[14] F. Liu, J. Tong, J. Mao, R. Bohn, J. Messina, L. Badger, and D. Leaf,
lower overall cost. For example, as shown in Fig. 6, we “NIST Cloud Computing Reference Architecture,” Tech. Rep., 2011.
see that ACMDC favors machines with f (p) = 0.2, over [15] Docker, Inc. (2018) Docker Swam Scheduling Strategies. [Online].
those with f (p) = 0.1, however, on further inspection of Available: https://github.com/docker/
[16] B. Korte and J. Vygen, Combinatorial Optimization: Theory and Algo-
the substrate parameters, we find that this result is the direct rithms. Springer-Verlag Berlin Heidelberg, 2006.
outcome of the higher per unit resource cost associated with
more reliable machines. At the same time, we note that
while the performance evaluation in this section accords equal
importance to u(p) and v(p), a Cloud Broker is free to adjust
the weights wu and wv based on their preferences.
IV. C ONCLUSION
In this paper, we have presented an optimal containerization
framework for the cloud-based deployment of microservices,
ACMDC. ACMDC helps Cloud Brokers minimize their total
cost outlay while meeting the SLA requirements set forth by
Cloud Consumers. As part of ACMDC, we have developed
mathematical representations for the physical substrate, ap-
plications, and microservices, and introduced constraints that
accurately reflect the cloud domain. ACMDC itself has been
presented as a reactive event-driven framework that enables
elasticity with low complexity. Finally, we have validated the
superior performance of ACMDC when compared to the state-
of-the-art deployment strategies based on metrics relating to
the number of active physical machines, the total cost outlay,
and system reliability.
R EFERENCES
[1] A. Wu, “Taking the Cloud-Native Approach with Microservices,” Ma-
genic/Google Cloud Platform, Tech. Rep., 2017.
[2] M. Amundsen and M. Mclarty, Microservice Architecture. O’Reilly
Media, Inc, USA, 2016.
[3] R. Morabito, J. Kjallman, and M. Komu, “Hypervisors vs. Lightweight
Virtualization: A Performance Comparison,” in 2015 IEEE International
Conference on Cloud Engineering, Mar. 2015.
[4] D.-N. Le, R. Kumar, G. N. Nguyen, and J. M. Chatterjee, Cloud
Computing and Virtualization. John Wiley & Sons, 2018.
[5] C. Pahl, “Containerization and the PaaS cloud,” IEEE Cloud Computing,
vol. 2, no. 3, pp. 24–31, May 2015.
[6] M. Aazam and E.-N. Huh, “Cloud broker service-oriented resource
management model,” Transactions on Emerging Telecommunications
Technologies, vol. 28, no. 2, pp. 2937–2953, Apr. 2015.
[7] S. S. Chauhan, E. S. Pilli, R. Joshi, G. Singh, and M. Govil, “Brokering
in interconnected cloud computing environments: A survey,” Journal of
Parallel and Distributed Computing, Aug. 2018.
[8] J. Monsalve, A. Landwehr, and M. Taufer, “Dynamic CPU Resource
Allocation in Containerized Cloud Environments,” in 2015 IEEE Inter-
national Conference on Cluster Computing, Sep. 2015.
[9] X. Wan, X. Guan, T. Wang, G. Bai, and B.-Y. Choi, “Application
deployment using Microservice and Docker containers: Framework and
optimization,” Journal of Network and Computer Applications, vol. 119,
pp. 97–109, Oct. 2018.

Adaptive Containerization For Microservices in Distributed Cloud Systems

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Adaptive Containerization For Microservices in Distributed Cloud Systems

Uploaded by

Copyright:

Available Formats

,(((WK$QQXDO&RQVXPHU&RPPXQLFDWLRQV