You are on page 1of 31

@ThanosMetrics Devoxx UK - November 2nd, 2021

Thanos: Scaling Prometheus 101

Giedrius Statkevičius, Site Reliability Engineer (Vinted)


giedrius.statkevicius@gmail.com
GiedriusS / stag1e
@ThanosMetrics

Giedrius Statkevičius

▪ Site Reliability Engineer @ Vinted

▪ Observability team

▪ OSS Contributor

○ Thanos

○ grafana-tools/sdk

▪ Maintaining a blog at https://giedrius.blog


@ThanosMetrics

Thanos Community

● Fully open source from start


● Started in Nov 2017
● CNCF Incubating project!

● 9600+ Github stars (+1600)


● 398+ contributors (+72) so
+2x growth since 2019
● ~2794 slack users (+765)
● 9 maintainers, 4 triagers from
8 different companies.

● Transparent Governance
(max 2 votes per company)

● Prometheus Ecosystem
@ThanosMetrics

Production Users

54 known companies to be using Thanos in production


and growing!
@ThanosMetrics
@ThanosMetrics

🤔
@ThanosMetrics

��
@ThanosMetrics

Prometheus

Alertmanager Grafana

Rule + Alert Engine Query Engine


SVC 1
SVC 2
Compactor Scrape Engine SVC 3
/metrics
Local Storage
@ThanosMetrics

Prometheus Limitations

SVC 1
SVC 2
SVC 3
SVC 4
NODE 1 NODE 2
@ThanosMetrics

Prometheus Limitations

SVC 1 SVC 3
SVC 2 SVC 4

NODE 1 NODE 2
The same time series
@ThanosMetrics

Prometheus Limitations

SVC 1 SVC 3
SVC 2 SVC 4

NODE 1 NODE 2
The same time series
@ThanosMetrics
@ThanosMetrics

Thanos Features
@ThanosMetrics

Typical HA Sidecar-Based Set Up

Monitoring Cluster

Query

Query

gRPC gRPC gRPC

...
Cluster 1 Cluster 2 Cluster N
@ThanosMetrics

Typical HA Sidecar-Based Set Up

Monitoring Cluster

Query
Object
Storage
Query

...
Cluster 1 Cluster 2 Cluster N
@ThanosMetrics

Typical HA Sidecar-Based Set Up

Monitoring Cluster

Query
Object
Storage
Query

Or

Load balancer N Load balancer N+1

Compactor

Thanos Store Thanos Store


@ThanosMetrics

Typical HA Sidecar-Based Set Up

Monitoring Cluster

Compact Object
Storage
Query Store

Ruler Alertmanager

...
Cluster 1 Cluster 2 Cluster N
@ThanosMetrics

Replicate

Replicate

Object Object
Storage Storage
@ThanosMetrics

Typical HA Sidecar-Based Set Up

Monitoring Cluster

Query-Frontend

Query-Frontend

Or

Load balancer N Load balancer N+1

Thanos Query Thanos Query


@ThanosMetrics

Store API
@ThanosMetrics

Multi-Region HA Sidecar-Based Set Up

Monitoring Cluster

Query

gRPC gRPC
gRPC

Query Query Query

...

Cluster 1 Cluster 2 Cluster N


@ThanosMetrics

Cortex? Thanos?
@ThanosMetrics

Recent New Features


@ThanosMetrics

The Most Important Feature


@ThanosMetrics

Federated APIs

Metrics metadata, targets, exemplars, ...


Query Sidecar

Metrics metadata, targets, exemplars, ...

Query

Another fan-out

Store Sidecar
@ThanosMetrics

Offline Deduplication

New block with deduplicated data

Object
Compact
Storage

Blocks data grouped by external labels


@ThanosMetrics

What’s next?
@ThanosMetrics

Summary
@ThanosMetrics

Getting Started

● Katacoda tutorials at
https://katacoda.com/thanos
● Myriad of ways to deploy:
https://goatlas.io/,
https://github.com/bitnami/charts/tre
e/master/bitnami/thanos/,
https://github.com/prometheus-opera
tor/prometheus-operator/blob/master
/Documentation/thanos.md, and
others
@ThanosMetrics

Getting Involved

● Lots of fun problems to solve!


● Main repository is
http://github.com/thanos-io/thanos,
various community projects are
https://github.com/thanos-community
/
● Thanos community hours where you
can get unblocked quickly and/or
discuss potential features
● Get involved via LFX / GSoC - so far we

have had 20 mentees


@ThanosMetrics

Thank You!

Giedrius Statkevičius, Site Reliability Engineer (Vinted)


giedrius.statkevicius@gmail.com
GiedriusS / stag1e

You might also like