You are on page 1of 34

Monitoring with Elastic

Machine Learning

Simon Quain & Jamie Lynch


Monitoring Solutions, Sky
1 Introduction
2 How it started

Agenda 3 Projects – Sky Mobile


4 Projects – OTT Targeted Advertising
5 Lessons learned

2
Introduction

Who are Sky?

7 24m £6bn 31,000+ £11bn 100+


Countries Customers Content 
 Employees Revenue Original
investment productions

3
Introduction

Who are Monitoring Solutions?

4
Introduction

Who are Monitoring Solutions?

Service &
Performance
Monitoring

5
Introduction

Who are Monitoring Solutions?

Data

6
Introduction

Who are Monitoring Solutions?

Dashboards

7
Introduction

Who are Monitoring Solutions?

Alerting

8
How it started

Why use Elastic?

• Collating data from multiple sources


• Troubleshooting and investigating issues
• Visualizing and presenting data
Sources

Ingests
Application & Syslogs

10
Moving to ECE

Machine Users and Cluster per Simplified


Watcher
Learning Roles use case Upgrades

11
Synthetic Monitoring

12
Network Monitoring

13
Alerts Viewer

14
Why Anomaly Detection

15
Why Anomaly Detection

16
Why Anomaly Detection

17

Monitoring our mobile proposition
18
Sky Mobile

Monitoring our mobile proposition

19
Sky Mobile


20
Sky Mobile


21
A few terms

• OTT – Over The Top


– Delivering TV over the internet (Sky Go, NOW TV,
Netflix etc)
• VOD – Video On Demand
– Streaming or downloading content on demand,
such as bingeing a Game of Thrones box set
• Linear
– Streaming live content as it plays out, such as
watching live football
OTT Targeted Advertising

“Different ads can be shown to different households watching the same


programme. This means advertisers can promote on national channels,
but to relevant audiences.”

23
OTT Targeted Advertising

The Data
EventType R = Request

Read each log line with this parameter into cache and
Ad Insertion Engine record the Transaction ID

EventType I = Impression

Match these logs from around 20 minutes later to


previous using the ID

Insert into elastic

A B C D Add field ‘HasImpression` True/False depending


on whether EventType I turned up or not

24
OTT Targeted Advertising

The Data 1.8

1.35

0.9

0.45

0
Feb Sep Feb Sep
Primary Storage (tb) Document Count (billion)
25
OTT Targeted Advertising

The Process

data* ML .ml_anomalies
Elastic index where all of the
Datafeed Elastic index where all of the
Job
logs are stored generated anomalies are
stored

Datafeed ML
Job

ML
Datafeed
Job

26
Other data breaks*
The Process
 Elastic index where
information about ad
breaks is stored

.ml_anomalies
Elastic index where all of the
generated anomalies are
stored
Watcher

- Check for anomalies


- Check for breaks
- Query other sources

Callout
Central Alert
Gateway

Ticket
Alerting Config ServiceNow
Orchestration

27
OTT Targeted Advertising

An Example Ticket
Anomalies found in: [job_name]
At around: 2019-03-24 14:40:00

The following anomalies were found:


SiteSection: Linear:Android:Devices
observed value=2000 (expected=100)
[http://somelink.com]
SiteSection: Linear:iOS:Devices:
observed value=500 (expected=20)
[http://somelink.com]

Elastic job: [job_id]


Dashboard Link: [http://somelink.com]
Support Page: [http://relevant_support_page.com]

28
OTT Targeted Advertising

The Issues

29
OTT Targeted Advertising

The Successes

30
OTT Targeted Advertising

The Successes

31
Lessons learned

• ML isn’t a magic bullet


• Understand your data
• Alerting is hard!
• Consistent data makes your life easier
We’re hiring!
Search for Monitoring at workforsky.com

You might also like