You are on page 1of 9

CAPSTONE PAPER SUMMARY:

CHARACTERIZING MICROSERVICE DEPENDENCY AND PERFORMANCE:ALIBAB TRACE ANALYSIS

What does Alibaba Cloud provide?


Alibaba Cloud offers a full range of cloud products and services for databases, networking,
security, analytics & big data, domains & website management, application services,
media services, middleware, and more.

What is the meaning of off-the-shelf?


: available as a stock item : not specially designed or custom-made.

What is upstream and downstream system in microsevices?


An upstream system is any system that sends data to the Collaboration Server system. A
downstream system is a system that receives data from the Collaboration Server system

What is stateless and statefull system in services?


A stateless system sends a request to the server and relays the response (or the state) back without
storing any information. On the other hand, stateful systems expect a response, track information, and
resend the request if no response is received.

What is bare metal nodes?


A bare-metal instance is an instance created directly on a physical machine, without any
virtualization layer running underneath it.
////////////////////////////////////////////////////////////////////////////////////////////////////

CHARACRERIZING MICROSERVICES DEPENDENCY AND PERFORMANCE : ALIBABA TRACE ANALYSIS

INTRODUCTION :

The study here focus on the characterization of micro services dependency as well as its runtime
performance.Depth anatomy of micro-service call graphs to quantify the difference between the
traditional DAGs of data parallel jobs.

It is observed that micro-service call graphs are heavy-taildistributed .

Micro-services runtime performance indicates that most microservices are much more sensitive
to CPU interference than memory interference.

Leading cloud service companies such as AWS and Alibaba start to provide an off-shelf microservices
architecture for users to ease their application deployment.

In this paper a comprehensive analysis of the large-sclae deployment of microservices in production


cluster.Analysis of the behaviours of more than 20,000 microservices in a 7-day period and profile
their characteristics,including anatomy of dynamic call graphs and characterization of microservices
dependency as well as the runtime performance analysis.

We build a stochastic model to simulate the dynamic dependencies between microservices.

1.

Many benchmarks such as Acme Air, musuite and deathstarbench explore the major characterisitics
of microservices but these studies only provide insights into relatively small-scale clusters.Alibaba
paper aims to make a comprehensive analysis of large-scale development of production of
microservices.

2.

Analyzing the behaviour of more than 20,00 microservices in a 7-day period and profile their
characteristics,including the anatomy of dynamic calls graphs and microservice dependencies along
with the runtime performance analysis.

FINDING OF THE PAPER

1. Microservice call graphs are substantially different from traditional DAGs of data-paralell jobs.

.microservice call graphs-heavy tails

.10% of call graphs consist more than 40 microservice stage-follows heavy-tail distribution.
.data-parallel jobs only contain few stages

.The call graph is similar to tree and in alibaba majorityof nodes have in-degree of one

.About 5% of microservices aer multiplexed by more than 90% of online sevices in alibaba cluster.

.In extreme cases the same online service can have more than 9 classes of topologically different
graphs.

.Traditional graphs are static and do not change after a job is submitted.

2. Strong dependency between microserivce provides good opportunities for optimization of


microservice designs.

.The interface of an incoming call to upstream microservice(UM) is the same as the rpely interface
of UM for downstream microservice(DM) to call back.

.Coupling these two interface together can avoid unneccesary deadlocks.

.For micro-services pairs that have strong coupled dependency,the UM will continuously call DM
multiple times whenever UM is called by others.

.Coupling these interfaces will greatly help reduce the communication load

3. Microservice performance is much more sensitive to CPU interface than memory interference

.there are hundrends of containers which are co located with batch application on multiple
physical hosts.

.conatiners can be as low as 10% and the resulted response time[RT] does not vary much.Cpu
interfcae can hurt RT performance.

.host CPU utilization of 30% can degrade the average RT by 20% when compared to utilization of
10% so there is a strong demand for more efficient job schedulers

4. Stochastic models can simulate dynamic microservice call graph quite well

.for the existing microsevie benchamark-the microservices does noe exeed 40

.one more limitation-each service keeps a fixed call graph topology that do not change often for
those having having multiple online services

.Stochastic model to simulate the micro service call graphs is created to mitigate these limitations
and generate microservice traces on a much larger scale,to preserve graph properties obsvered from
Alibaba traces.
CONTRIBUTIONS IN THIS PAPER
1.Comprehensive study on large scale deployment of microservices in production cluster.Covers
both structural properties of microservice call graph as well as microservice call dependencies.

2.characterization of microservice runtime performance- provides deep insights into microservice


scheduling and resource management.

3.Built a graph model to efficentiantly generate microservice traces on a large scale.

4.Theoritical analysis to characterize structural properties in simulated graphs.

2 MICROSERVICE BACKGROUND AND ALIBABA OVERVIEW


A microservice usually runs in multiple containers to serve user’s request,The request from the user
is called an origin request and this is first sen to an Entering microservice,-this then triggers series of
calls between related microservices.

Fig (a):User issues an origin wen request via HTTP to entering microservice.A which is a front end
web service. When replying to the origin request Microsevice A shall call its Dms in turn further call
their downstram microservices.

Call graph contains multiple calls between different pairs of microservices.

A pair of microservice contains one upstream microservice(UM) and one downstream


microservice(DM)

Microservices can be categorized into two types, namely, stateless (a circle in Fig. 1(a)) and stateful
(a rectangle or hexagon).

1. Stateless services are isolated from state data while stateful services such as databases [8] and
Memcached [13] need to store data.

2. Stateful services often provide a small number of uniform query interfaces such as reading or
writing data,while stateless services tend to provide tens to hundreds of evolving interfaces for
different purpose.
Three types of communication between a pair of microservices:

1. Inter-process communication[IP] : Between stateless and stateful microservices

2. Remote invocation : Remote procedural call[RPC]-two way communication under which a DM


must return a result to its corresponding UM.

3. Indirect communication: Message queue(MQ) one way communication.under such


communications the UM sends a message to the third entity which will persist the message for
reliability and the DM fetches the message on demand from the third entity directly without a reply

Remote invocations has a high efficiency while indirect communication maintains good flexibility.

Fig(1b):The call depth(aka the number of tiers) is defined as the length of the longest path in a call
graph .

Response time of call graph: length of the interval from UM calling its DM to it receiving the
response.

Since an indirect communication does not need to return a result,RT of an origin request is
dominated by the part associated with its user.

Same class of user requests can trigger different micro-service call procedures and thus incur
heterogenous RTs.

ALIBABA TRACE OVERVIEW


Physical running environment:

ALibab cluster adopt KUbernetes to manage bare-metal cloud and relies on the hardware-software
hybrid virtio I/O system to enhance cluster performance and achieve better isolation between
different services.

Online services are running containers which are managed by KUbernetes directly.
For batch jobs Kubernetes shall first allocate a certain amount of pods then be delivered to Fuxi a
scheduler for batch jobs in Alibaba for further scheduling.Each pods will run insecure conatiners to
process the batch jobs.

Microservices system metrics.

Alibaba makes use of the application Real-time monitoring Service system to collect microservice
traces which is similar to Dapper.

The microservice monitoring system collects several system metrics for each container produced In
every minute and takes the average to record.These metrics range from hardware-layer such as
cache misses per kilo instructions and cycles per instructions to OS including CPU utilization and
memory utilization and also conatins application-layer index such as Java virtual machine.

Values denote the exact number of these metrics and Timestamp is the time when a corresponding
metric is collected.Pod IP is the IP address of the pod in where a Microservice is deployed.

MIcroservice invocations in a call graph.

All invocations between microserices triggered by the same user request share one unique Trace ID
which is the identifier of a call graph.

UM shall call a DM via a specific interface The IP address of the pods holding them are recorded by
system as well,ie UM Pod_IP and DM Pod_IP.The trace also contains the RT of each call.

Each call is identified by a unique rpcID-which contains the ID information of a pair of microservices.
For example, rpcID 0.1.1 and 0.1.2 de

note two calls sent from the same UM to two different DMs.

ANATOMY OF CALL GRAPHS

Call graphs present several distinct features and are substantially different from traditional DAG
graphs of batch processing jobs.

Most most call graphs contain a small number of microservices and have three tiers,a non negligible
number of graphs are big and deep.

The graphs in microservices follow a Burr distribution.

The scale of existing bencmark is far smaller than that in real traces.for large call graphs about 50%
of their microservices are memcacheds.this percentage is 20% igher than that under those call
graphs of small size.

Getting hot data from Memcacheds is much faster than from databases,maintaining a ;arge number
of Memcacheds can signify reduce the RT of complicated services.
Common graph depth in alibaba trace is 3
The reason behind this is that, when

serving an online request in Alibaba cluster, a microservice

usually calls multiple downstream microservices, which will

then query data from MCs directly as such data is usually

hot data that is frequently accessed by other requests and


cached in Memcacheds, e.g., the information of goods in an

online store.

The average depth of call graph is 4.27 with a standderivation of 3.25

The call depth of microserivces graph is in general shirter than the critical path length prensented by
DAG graphs from batch applications in Alibaba clusters.

The microservice call graph behaves likes a tree and many of them only contain a long chain as in
this figure

The call depth stagnates when the number of microservices increases.This is due to that a
microservice graph tends to branch out quickly like atree to include more two-tier invocations.

Once a call is sent to a stateful microservice it will not incur further calls This scattering property is
different from that obsvered from traditiona; DAG graphs which usually contain both scatter and
gather components.

To validate the further argument


More than 10% of stateless microservies have an out-degree of atleast 5, while most microservices
have an in degree of one As a comparision more than 99% of vertices in DAG graphs have out-
degrees no more than 3 while their in-degrees follow along-tail distibution.

Many tiers here have only 1 microservices.As long as the depth becomes larger than 2,the
corresponding tier includes only one microservice with a high probability.

For these graphs detecting the bottlenecks can be relatively easy.One can efficiently derive the
proccesing time of each individual microservice along the chain and check wheteher an overload
occurs based on info from historical traces

Many stateless microservices are hot-spots.To quantify to what extent a single microservice can be
shred by all call graphs,we explore the distribution of in-degree(out-degree) of stateless
microservices in aggregate calls.

Aggregates calls count all the invocations related to each individulas related to each induvidual
microservices from all call graphs.
more than 5% of

microservices have in-degrees of 16 in aggregate calls. These

super microservices appear in nearly 90% of call graphs and

handle 95% of total invocations in Alibaba traces.

This result implies that loosely-coupled microservices architecture leads to a significant unbalance of
workload across different microsdrvices.This is beneficial for resources scaling of individual
microservices and allocate much more containers to these super microservices.

Microservice call graphs are highly dynamic

Another distinction of micro-service call graphs is that they can present significant topologic
differences between each other even among all the graphs generated by the same online service

Each online service is represented by an entering microservice which is called by user directly
There are more than 3000 different services in total in Ali baba traces.Once a call is sent to an
entering microservice the subsequent calls can be quite complicated depending on the status of a
user.We apply graph learning algorithms to cluster microservice call graphs into different clusters
based on their topology.

GRAPH LEARNING ALGORITHMS

You might also like