You are on page 1of 23

Session 11.

Monitoring
Ram N Sangwan
Agenda
• Introduction to Monitoring
• Client Libraries
• Scraping Pull and Pushing monitoring
• Querying
• Service Discovery
• Node Exporters
• Expression Browser

2
Key Value Data Model
• Before starting with Prometheus tools, it is very important to get a complete
understanding of the data model.
• Prometheus works with key value pairs. The key describes what you are
measuring while the value stores the actual measurement value, as a number.
• Remember : Prometheus is not meant to store raw information like plain text
as it stores metrics aggregated over time.
• The key in this case is called a metric. It could be for example a CPU rate or
a memory usage.
• But what if you wanted to give more details about your metric?
• What if my CPU has four cores and I want to have four separate metrics for
them?

3
Labels
• This is where the concept of labels
come into play.
• Labels are designed to provide more
details to your metrics by appending
additional fields to it.
• You would not simply describe the
CPU rate but you would describe the
CPU rate for core one located at a
certain IP for example.

4
Metric Types and Counter
Metric Types
When monitoring a metric, there are essentially four ways you can describe it
with Prometheus.
• Counter - Probably the simplest form of metric type you can use. A counter,
as its name describes, counts elements over time. As you would physically
imagine it, a counter only goes up or resets.
• As a consequence, a counter is not naturally adapted for values that can go
down or for negative ones.
• A counter is particularly suited to count the number of occurrences of a
certain event on a period, i.e the rate at which your metric evolved over time.
• Now what if you wanted to measure the current memory usage at a given time
for example?

5
Gauges
The memory usage can go down, how can it be measured with Prometheus?
Gauges
• Gauges are designed to handle values that may decrease over time.
• Visually, they are like thermomethers: at any given time, if you observe the
thermomether, you would be able to see the current temperature value.
• But, if gauges can go up and down, accept positive or negative values, aren’t
they a superset of counters?
• Gauges are perfect when you want to monitor the current value of a metric
that can decrease over time.
• Gauges can’t be used when you want to see the evolution of your metrics over
time.

6
Histogram
• Histogram is a more complex metric type. It provides additional information for
your metrics such as the sum of the observations and the count of them.
• Values are aggregated in buckets with configurable upper bounds. It means
that with histograms you are able to :
• Compute averages : as they represent the fraction of the sum of your values divided by
the number of values recorded.
• Compute fractional measurements on your values : this is a very powerful tool as it
allows you to know for a given bucket how many values follow a given criteria. This is
especially interesting when you want to monitor proportions or establish quality indicators.
In a real world context, I want to be alerted when 20% of my servers respond in more than 300ms or
when my servers respond in more than 300ms more than 20% of the time.
• As soon as proportions are involved, histograms can and should be used.

7
Summaries
• Summaries are an extension of histograms. Besides also providing the sum
and the count of observations, they provide quantiles metrics on sliding
windows.
• As a reminder, quantiles are ways to divide your probability density into
ranges of equal probability.
• Histograms or summaries? Essentially, the intent is different.
• Histograms aggregate values over time, giving a sum and a count function
that makes it easy to see the evolution of a given metric.
• On the other hand, summaries exposes quantiles over sliding windows (i.e
continuously evolving over time).
• This is particularly handy to get the value that represents 95% of the values
recorded over time.

8
Prometheus monitoring rich ecosystem
• The main functionality of Prometheus, besides monitoring, is being a time series database.
• However, when playing with time series database, you often need to visualize them,
analyze them and have some custom alerting on them.
• Here are the tools that compose Prometheus ecosystem to enrich its functionalities :
• Alertmanager : Prometheus pushes alerts to the Alertmanager via custom rules defined in
configuration files. From there, you can export them to multiple endpoints such as Pagerduty or
Slack.
• Data visualization : similarly to Grafana, you can visualize your time series directly in Prometheus
Web UI. You can easily filter and have a concrete overview of what’s happening on your different
targets.
• Service discovery : Prometheus can discover your targets dynamically and automatically scrap new
targets on demand. This is particularly handy when playing with containers that can change their
addresses dynamically depending on demand.

9
Prometheus Architecture

10
Prometheus Monitoring Use Cases
DevOps Industry
• With all the exporters built for systems, databases and servers, the primary
target of Prometheus is clearly targeting the DevOps industry.
• The necessary effort to get your instances up and running is very low and
every satellite tool can be easily activated and configured on demand.
Healthcare
• Nowadays, monitoring solutions are not made only for IT professionals. They
are also made to support large industries, providing resilient and scalable
architectures for healthcare.
• As the demand grows more and more, the IT architectures deployed have to
match that demand. Without a reliable way to monitor your entire
infrastructure, you may run the risk of having massive outages on your
services.

11
Client Libraries
• Before you can monitor your services, you need to add instrumentation to their
code via one of the Prometheus client libraries.
• These implement the Prometheus metric types.
• Choose a Prometheus client library that matches the language in which your
application is written.
• This lets you define and expose internal metrics via an HTTP endpoint on your
application’s instance:
• Go Unofficial third-party client libraries:
Bash
• Java or Scala Dart
• Python .NET / C#
Node.js
• Ruby Perl
PHP
R
etc.

12
Client Libraries
• When Prometheus scrapes your instance's HTTP endpoint, the client library
sends the current state of all tracked metrics to the server.
• If no client library is available for your language, or you want to avoid
dependencies, you may also implement one of the supported exposition
formats yourself to expose metrics.
• When implementing a new Prometheus client library, please follow the
guidelines on writing client libraries.

13
Pull vs Push
• There is a noticeable difference between Prometheus monitoring and other
time series databases :
• Prometheus actively screens targets in order to retrieve metrics from them.
• This is very different from InfluxDB for example, where you would essentially push data
directly to it.

14
Pull vs Push

15
Pull vs Push
• Both approaches have their advantages and inconvenient.
• From the literature available on the subject, here’s a list of reasons behind this
architectural choice:
• Centralized control : if Prometheus initiates queries to its targets, your
whole configuration is done on Prometheus server side and not on your
individual targets.
• Prometheus is the one deciding who to scrap and how often you should
scrap them.
• With a push based system, you may have the risk of sending too much data
towards your server and essentially crash them. A pull based system
enables a rate control with the flexibility of having multiple scrap
configurations, thus multiple rates for different targets.

16
Pull vs Push
• Prometheus is not an event-based system and this is very different from other
time series databases.
• Prometheus is not designed to catch individual and punctual events in time
(such as a service outage for example) but it is designed to gather pre-
aggregated metrics about your services.
• Concretely, you won’t send a 404 error message from your web service along
with the message that caused the error, but you will send the fact your service
received one 404 error message in the last five minutes.
• This is the basic difference between a time series database targeted for
aggregated metrics and one designed to gather ‘raw metrics’

17
Target Discovery
• Via a file exporter for example, makes it also an ideal solution for stacks that
rely heavily on containers and on distributed architectures.
• In a world where instances are created as fast as they are destroyed, service
discovery is a must have for every DevOps stack.

18
Exporters
• For custom applications, instrumentation is very handy at it allows you to
customize the metrics exposed and how they are changed over time.
• For ‘well-known’ applications, servers or databases, Prometheus built with
vendors exporters that you can use in order to monitor your targets.
• Those exporters are easily configurable to monitor your existing targets.
• Examples of exporters include :
• Database exporters : for MongoDB databases, SQL servers, and MySQL servers.
• HTTP exporters : for HAProxy, Apache or NGINX servers.
• Unix exporters : you can monitor system performance using built node exporters that
exposes complete system metrics out of the box.

19
Exporters with Prometheus

20
Expression Browser
• The expression browser is available at /graph on the Prometheus server,
allowing you to enter any expression and see its result either in a table or
graphed over time.
• This is primarily useful for ad-hoc queries and debugging.
• For graphs, use Grafana or Console templates.

21
What Next?
• Explore Client Libraries
• Explore Service Discovery Online Documentation
• Read more on Node Exporters

22
Thank You

23

You might also like