You are on page 1of 41

Monitoring applications

with Prometheus: intro,


Adform's experience, and
practical tips
Giedrius Statkevičius, IT Systems Engineer
2019-03-28

1
Agenda
• Adform
• History leading up to Prometheus
• Prometheus
• Grafana
• Practical tips
• Adform's experience

2
https://bit.ly/2JKq6qN

3
Adform

4
Advertising Industry

Advertiser Media agency Publisher User

5
Key Milestones
Innovating the Automation of Buying and Selling Advertising

Adform
Founded

2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018

TPAS RM DSP PMP RM DMP PPAS / ISO/ IEC MRC / Dynamic


DSP SSP/ AG 27001 TAG Ads

6
Global Infrastructure

7
Monitoring

8
Control center

9
Why Monitoring is Important?
Problems that we are trying to solve

• We want to know what our software is doing once it gets deployed


- Modern applications have a lot of moving parts, especially when using the microservices paradigm
- We want to know if we are following the defined SLAs
- Is it up?
- Is it performant?

- How many errors were returned?


- And so on...

- We want to have pretty dashboards and graphics of all of this


• We want to aggregate data per-component or per-machine
- It needs to have an easy-to-use querying language
• We need to be alerted when something is going wrong
• It needs to be resilient itself
- Easy to predict the storage requirements
- Is highly-available and fault-tolerant

10
History leading to Prometheus from
the perspective of Unix and Unix-like
systems

11
Lesson

12
History leading to Prometheus
Since the beginning... (almost)

• Simple letters on the screen


• File descriptors
• Standard output (1) / input (0) / error (2)
• Redirection of file descriptors to central storage (files)
• syslog(2)
• Sending syslog(2) messages over UDP
- Designed by Eric Allman for sendmail in ~1984, standardized first in 2001 (RFC 3164)

• Parsing of messages (grok in logstash, etc.)


• We can send them over TCP/QUIC but... maybe in general we can do better?

13
What if the structure of the
messages were
standardized?

14
How to control who sends
how many messages? How to
know if they are legit?

15
Let's turn around the whole
process and make the
structure according to our
requirements!

16
Introducing Prometheus
• Free software project that is a fusion of
different predecessors: Borgmon, Graphite,
etc.
• The underlying special database – the time
series database – got inspiration from
Facebook's Gorilla time series database
(white-paper:
http://www.vldb.org/pvldb/vol8/p1816-
teller.pdf)
• Based on a sliding time window
• Hugely popular: 22k+ stars on GitHub,
Google and all of the other big organizations
are using it
• It, coupled with a few components, solves all
of the problems outlined before

17
How does the data look like?

18
Example metrics

19
Structure of Prometheus data
• We call one data point a "metric"
• Metric is identified by its name and a set of labels
and their values (ASCII a-zA-Z characters), and
the metric's value (floating point number)
• Different labels provide different dimensions to
data
• The time-series database is specialized for this
kind of data and provides a high level of
compression
• Example:
current_wind_speed{city="Kaunas"} 10

20
How do we know that the data
is real?

21
Process of collecting metrics
• Prometheus itself sends GET requests to
specified end-points and parses the
metrics data
• It all happens periodically and the
timestamp gets written to the time series
database (that's where the
word time comes from)
• We know that the data is legit since we do
this from Prometheus side – we do not
trust random senders

22
How to query the data?

23
Prometheus query language -
PromQL
• Uses a syntax very similar to the metrics
• Values in square brackets specify a range-
vector. Example: ticket_price[5m]
• Plethora of functions for aggregation: sum,
avg, count, histogram_quantile,
et cetera
• Label selectors can use regular
expressions: =~, !~, =, !=

24
Useful: simple query in the Prometheus UI

25
Useful: calculate network usage

Very rudimentary and simple. How could we make it better?

26
Introducing Grafana

27
Grafana
• Grafana is an open source, feature rich
metrics dashboard and graph editor for
Graphite, Elasticsearch, OpenTSDB,
Prometheus and InfluxDB.
• Very user-friendly
- Dashboards which you can view are
composed of panels
- Different panels can show different
information in any appropriate way
- Has a concept of "organizations" so all of the
dashboards are separated

• Actively maintained, resilient, flexible

28
Grafana dashboard example

29
Intuitive interface

30
Alerting in Prometheus

31
Alerts
- Same expressions as in the PromQL
- The only extra things you need to define are
- Extra annotations which give useful information
to the person receiving it
- Thresholds
- The name of the alert and its group

- Where to send them


- How to group them

32
Alerting rule example

33
Adform's experience so far

34
Actively used central monitoring service

35
Available all over the world for developers/IT

36
Alerts

37
Alerts in Slack

38
Visibility and transparency

39
giedrius.statkevicius@adform.com
https://giedrius.blog
@stag1e
giedrius.statkevicius@adform.com
https://giedrius.blog
@stag1e