How To Autoscale Apps On Kubernetes With Custom Metrics

10/4/2019 How to autoscale apps on Kubernetes with custom metrics
Daniel Weibel
How to autoscale apps on Kubernetes

with custom metrics
PUBLISHED IN OCTOBER 2019
Welcome to Bite-sized Kubernetes learning — a regular column on the

most interesting questions that we see online and during our workshops
https://learnk8s.io/autoscaling-apps-kubernetes/ 1/14
answered by a Kubernetes expert.
Today's answers are curated by Daniel Weibel. Daniel is a software

engineer and instructor at Learnk8s.
If you wish to have your question featured on the next episode, please get in
touch via email or you can tweet us at @learnk8s.
Did you miss the previous episodes? You can find them here.
How do you scale apps on Kubernetes?

Deploying an app to production with a static configuration is not optimal.
Traffic patterns can change quickly, and the app should be able to adapt to
them:
When demand increases, the app should scale up (increasing the number
of replicas) to stay responsive.
When demand decreases, the app should scale down (decreasing the
number of replicas) to not waste resources.
Kubernetes provides excellent support for autoscaling applications in the

form of the Horizontal Pod Autoscaler.
In the following, you will learn how to use it.
Different types of autoscaling
First of all, to eliminate any misconceptions, let's clarify the use of the term
"autoscaling" in Kubernetes.
In Kubernetes, several things are referred to as "autoscaling", including:
Horizontal Pod Autoscaler: adjusts the number of replicas of an application

Vertical Pod Autoscaler: adjusts the resource requests and limits of a
container
Cluster Autoscaler: adjusts the number of nodes of a cluster
While these components all "autoscale" something, they are completely

unrelated to each other.
They all address very different use cases and use different concepts and
mechanisms.
They are developed in separate projects and can be used independently

from each other.
This article treats the Horizontal Pod Autoscaler.
What is the Horizontal Pod Autoscaler?

The Horizontal Pod Autoscaler is a built-in Kubernetes feature that allows to
horizontally scale applications based on one or more monitored metrics.
Horizontal scaling means increasing and decreasing the number of

replicas. Vertical scaling means increasing and decreasing the
compute resources of a single replica.
Technically, the Horizontal Pod Autoscaler is a controller in the Kubernetes

controller manager, and it is configured by HorizontalPodAutoscaler
resource objects.
The Horizontal Pod Autoscaler can monitor a metric about an app and
continuously adjust the number of replicas to optimally meet the current
demand.
Resources that can be scaled by the Horizontal Pod Autoscaler

include the Deployment, StatefulSet, ReplicaSet, and
ReplicationController.
To autoscale an app, the Horizontal Pod Autoscaler executes an eternal

control loop:
2. CALCULATE
1. QUERY HORIZONTAL
APP POD
AUTOSCALER
15 SEC
3. SCALE
The steps of this control loop are:
1. Query the scaling metric

2. Calculate the desired number of replicas
3. Scale the app to the desired number of replicas
The default period of the control loop is 15 seconds

The calculation of the desired number of replicas is based on the scaling

metric and a user-provided target value for this metric.
The goal is to calculate a replica count that brings the metric value as close
as possible to the target value.
For example, imagine that the scaling metric is the per-second request rate
per replica:
If the target value is 10 req/sec and the current value is 20 req/sec, the
Horizontal Pod Autoscaler will scale the app up (i.e. increasing the number
of replicas) to make the metric decrease and get closer to the target value.
If the target value is 10 req/sec and the current value is 2 req/sec, the
Horizontal Pod Autoscaler will scale the app down (i.e. decreasing the
number of replicas) to make the metric increase and get closer to the target
value.
The algorithm for calculating the desired number of replicas is based on the
following formula:
X = N * (c/t)
Where X is the desired number of replicas, N is the current number of

replicas, c is the current value of the metric, and t is the target value.
You can find the details about the algorithm in the documentation.
That's how the Horizontal Pod Autoscaler works, but how do you use it?
How to configure the Horizontal Pod

Autoscaler?
Configuring the Horizontal Pod Autoscaler to autoscale your app is done by
creating a HorizontalPodAutoscaler resource.
This resource allows you to specify the following parameters:
1. The resource to scale (e.g. a Deployment)

2. The minimum and maximum number of replicas
3. The scaling metric
4. The target value for the scaling metric
As soon as you create this resource, the Horizontal Pod Autoscaler starts
executing the above-mentioned control loop against your app with the
provided parameters.
A concrete HorizontalPodAutoscaler resource looks like that:
hpa.yaml
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: myhpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 1
maxReplicas: 10
metrics:
- type: Pods
pods:
metric:
name: myapp_requests_per_second
target:
type: AverageValue
averageValue: 2
There exist different versions of the HorizontalPodAutoscaler resource

that differ in their manifest structure. The above example uses version
v2beta2 , which is the most recent one at the time of this writing.
This resource specifies a Deployment named myapp to be autoscaled

between 1 and 10 replicas based on a metric named
myapp_requests_per_second with a target value of 2.
You can imagine that the myapp_requests_per_second metric represents the

request rate of the individual Pods of this Deployment — so the intention of
this specification is to autoscale the Deployment with the goal of maintaining
a request rate of 2 requests per second for each of the Pods.
So far, this all sounds good and nice — but there's a catch.
Where do the metrics come from?
What is the metrics registry?

The entire autoscaling mechanism is based on metrics that represent the
current load of an application.
When you define a HorizontalPodAutoscaler resource you have to specify

such a metric.
But how does the Horizontal Pod Autoscaler know how to obtain these
metrics?
It turns out that there's another component in play — the metrics registry.
The Horizontal Pod Autoscaler queries metrics from the metrics registry:
2. CALCULATE
1. QUERY HORIZONTAL
METRICS
APP POD
REGISTRY
AUTOSCALER
3. SCALE
The metrics registry is a central place in the cluster where metrics (of any
kind) are exposed to clients (of any kind).
The Horizontal Pod Autoscaler is one of these clients.
The purpose of the metrics registry is to provide a standard interface for

clients to query metrics from.
The interface of the metrics registry consists of three separate APIs:
The Resource Metrics API

The Custom Metrics API
The External Metrics API
RESOURCE
METRICS
API
METRICS
METRICS
METRICS
SOURCE
METRICS
SOURCE METRICS CUSTOM
SOURCE METRICS CLIENT
SOURCE REGISTRY API
EXTERNAL
METRICS
API
These APIs are designed to serve different types of metrics:
Resource Metrics API: predefined resource usage metrics (CPU and

memory) of Pods and Nodes
Custom Metrics API: custom metrics associated with a Kubernetes object
External Metrics API: custom metrics not associated with a Kubernetes
object
All of these metric APIs are extension APIs.
That means, they are extensions to the core Kubernetes API that are
accessible through the Kubernetes API server.
What does that mean for you if you want to autoscale an app?
Any metric that you want to use as a scaling metric must be exposed
through one of these three metric APIs.
Because only in that way they are accessible to the Horizontal Pod
Autoscaler.
So, to autoscale an app, your task is now not only to configure the Horizontal
Pod Autoscaler...
You also have to expose your desired scaling metric through the metric
registry.
How do you expose a metric through a metric API?
By installing and configuring additional components in your cluster.
For each metric API you need a corresponding metric API server and you
need to configure it to expose a specific metric through the metric API.
By default, no metric API servers are installed in Kubernetes, which

means that the metric APIs are not enabled by default.
Furthermore, you need a metrics collector that collects the desired metrics
from the sources (e.g. from the Pods of the target app) and provides them to
the metric API server.
METRIC METRIC API

COLLECTORS SERVERS
RESOURCE
METRICS
CADVISOR METRICS
SERVER
API
METRICS
METRICS
METRICS
SOURCE CUSTOM
METRICS
SOURCE PROMETHEUS
SOURCE PROMETHEUS METRICS CLIENT
SOURCE ADAPTER
API
EXTERNAL
PROMETHEUS
PROMETHEUS METRICS
ADAPTER
API
There are different choices of metric API servers and metric collectors for the
different metrics APIs.
Resource Metrics API:
The metrics collector is cAdvisor, which runs as part of the kubelet on

every worker node (so it's already installed by default)
The official metric API server for the Resource Metrics API is the Metrics
Server
Custom Metrics API and External Metrics API:
A popular choice for the metrics collector is Prometheus — however, other

metrics systems like Datadog or Google Stackdriver may be used instead
The Prometheus Adapter is a metric API server that integrates with
Prometheus as a metric collector — however, other metric collectors have
their own metric API servers
So, to expose a metric through one of the metric APIs, you have to go
through these steps:
1. Install a metrics collector (e.g. Prometheus) and configure it to collect the

desired metric (e.g. from the Pods of your app)
2. Install a metric API server (e.g. the Prometheus Adapter) and configure it to
expose from the metrics collector through the corresponding metrics API
Note that this applies specifically to the Custom Metrics API and
External Metrics API, which serve custom metrics. The Resource
Metrics API only serves default metrics and can't be configured to
serve custom metrics.
This was a lot of information, so let's put the bits together.
Putting everything together
Let's go through a full example of configuring an app to be autoscaled by the

Horizontal Pod Autoscaler.
Imagine, you want to autoscale a web app based on the average per-second
request rate of the replicas.
Also, assume that you want to use a Prometheus-based setup for exposing
the request rate metric through the Custom Metrics API.
The request rate is a custom metric associated with a Kubernetes

object (Pods), so it must be exposed through the Custom Metrics API.
Here's a sequence of steps to reach your goal:
1. Instrument your app to expose the total number of received requests as a

Prometheus metric
2. Install Prometheus and configure it to collect this metric from all the Pods of
your app
3. Install the Prometheus Adapter and configure it to turn the metric from
Prometheus into a per-second request rate (using PromQL) and expose
that metric as myapp_requests_per_second through the Custom Metrics API
4. Create a HorizontalPodAutoscaler resource (as shown above) specifying
myapp_requests_per_second as the scaling metric and an appropriate target
value
As soon as the HorizontalPodAutoscaler resource is created, the Horizontal

Pod Autoscaler kicks in and starts autoscaling your app according to your
configuration.
And you can lean back and watch your app adapting to traffic.
This article sets the theoretical framework for autoscaling an application

based on a custom metric.
In a future article, you will put this knowledge into practice and execute the
above steps with your own app on your own cluster.
From zero to a fully autoscaled application.
Stay tuned!
That's all folks!

If you enjoyed this article, you might find the following articles interesting:
Architecting Kubernetes clusters — choosing a worker node size where you'll learn the pros
and cons of having clusters with large and small instance types for your cluster nodes.
Boosting your kubectl productivity. If you work with Kubernetes, then kubectl is probably one
of your most-used tools. Whenever you spend a lot of time working with a specific tool, it is
worth to get to know it very well and learn how to use it efficiently.
More autoscaling, metrics, etc.

The article is a summary of the first three modules of the autoscaling
course on the Learnk8s Academy. The full course includes a deep dive
into the three different metrics server as well as how to:
expose metrics from your application
install and configure Prometheus to collect metrics
configure the custom and external metrics adapters to serve custom metrics to Kubernetes
tune the Horizontal Pod Autoscaler
Learn more ⇢
COMPANY
Contact us
Team
Careers
Blog
Newsletter
FOLLOW US
Copyright © learnk8s 2017-2019. Made with ❤ in London. View our Terms and Conditions or Privacy Policy. Send us a note to
hello@learnk8s.io

How To Autoscale Apps On Kubernetes With Custom Metrics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

How To Autoscale Apps On Kubernetes With Custom Metrics

Uploaded by

Copyright:

Available Formats

10/4/2019 How to autoscale apps on Kubernetes with custom metrics

How to autoscale apps on Kubernetes

Welcome to Bite-sized Kubernetes learning — a regular column on the

answered by a Kubernetes expert.

Today's answers are curated by Daniel Weibel. Daniel is a software

How do you scale apps on Kubernetes?

Kubernetes provides excellent support for autoscaling applications in the

In the following, you will learn how to use it.

Different types of autoscaling

In Kubernetes, several things are referred to as "autoscaling", including:

Horizontal Pod Autoscaler: adjusts the number of replicas of an application

While these components all "autoscale" something, they are completely

They are developed in separate projects and can be used independently

This article treats the Horizontal Pod Autoscaler.

What is the Horizontal Pod Autoscaler?

Horizontal scaling means increasing and decreasing the number of

Technically, the Horizontal Pod Autoscaler is a controller in the Kubernetes

Resources that can be scaled by the Horizontal Pod Autoscaler

To autoscale an app, the Horizontal Pod Autoscaler executes an eternal

The steps of this control loop are:

1. Query the scaling metric

The default period of the control loop is 15 seconds

The calculation of the desired number of replicas is based on the scaling

Where X is the desired number of replicas, N is the current number of

How to configure the Horizontal Pod

This resource allows you to specify the following parameters:

1. The resource to scale (e.g. a Deployment)

A concrete HorizontalPodAutoscaler resource looks like that:

There exist different versions of the HorizontalPodAutoscaler resource

This resource specifies a Deployment named myapp to be autoscaled

You can imagine that the myapp_requests_per_second metric represents the

Where do the metrics come from?

What is the metrics registry?

When you define a HorizontalPodAutoscaler resource you have to specify

The Horizontal Pod Autoscaler is one of these clients.

The purpose of the metrics registry is to provide a standard interface for

The interface of the metrics registry consists of three separate APIs:

The Resource Metrics API

These APIs are designed to serve different types of metrics:

Resource Metrics API: predefined resource usage metrics (CPU and

All of these metric APIs are extension APIs.

How do you expose a metric through a metric API?

By installing and configuring additional components in your cluster.

By default, no metric API servers are installed in Kubernetes, which

METRIC METRIC API

Resource Metrics API:

The metrics collector is cAdvisor, which runs as part of the kubelet on

Custom Metrics API and External Metrics API:

A popular choice for the metrics collector is Prometheus — however, other

1. Install a metrics collector (e.g. Prometheus) and configure it to collect the

This was a lot of information, so let's put the bits together.

Putting everything together

Let's go through a full example of configuring an app to be autoscaled by the

The request rate is a custom metric associated with a Kubernetes

Here's a sequence of steps to reach your goal:

1. Instrument your app to expose the total number of received requests as a

As soon as the HorizontalPodAutoscaler resource is created, the Horizontal

This article sets the theoretical framework for autoscaling an application

From zero to a fully autoscaled application.

That's all folks!

More autoscaling, metrics, etc.

expose metrics from your application

install and configure Prometheus to collect metrics