You are on page 1of 85

Anomaly Detection

How to Find What You Didnt


Know to Look For

October 14, 2014


MapR Technologies, confidential
2014 MapR Technologies 1
Anomaly Detection:
How To Find What You Didnt Know to Look For

Ted Dunning, Chief Applications Architect MapR Technologies


Email tdunning@mapr.com tdunning@apache.org
Twitter @Ted_Dunning
Ellen Friedman, Consultant and Commentator
Email ellenf@apache.org
Twitter @Ellen_Friedman

2014 MapR Technologies 2


A New Look at Anomaly Detection
by Ted Dunning and Ellen Friedman June 2014 (published by OReilly)

e-book available courtesy of MapR

http://bit.ly/1jQ9QuL

2014 MapR Technologies 3


Practical Machine Learning series (OReilly)
Machine learning is becoming mainstream
Need pragmatic approaches that take into account real world
business settings:
Time to value
Limited resources
Availability of data
Expertise and cost of team to develop and to maintain system
Look for approaches with big benefits for the effort expended

2014 MapR Technologies 4


Anomaly Detection

2014 MapR Technologies 5


Who Needs Anomaly Detection?

Utility providers using


smart meters

2014 MapR Technologies 6


Who Needs Anomaly Detection?

Feedback from
manufacturing assembly
lines

2014 MapR Technologies 7


Who Needs Anomaly Detection?

Monitoring data traffic on


communication networks

2014 MapR Technologies 8


What is Anomaly Detection?
The goal is to discover rare events
especially those that shouldnt have happened
Find a problem before other people see it
especially before it causes a problem for customers
Why is this a challenge?
I dont know what an anomaly looks like (yet)

2014 MapR Technologies 9


Spot the Anomaly

2014 MapR Technologies 10


Spot the Anomaly

Looks pretty
anomalous
to me

2014 MapR Technologies 11


Spot the Anomaly

Will the real anomaly


please stand up?

2014 MapR Technologies 12


Basic idea:
Find normal first

2014 MapR Technologies 13


Steps in Anomaly Detection
Build a model: Collect and process data for training a model
Use the machine learning model to determine what is the normal
pattern
Decide how far away from this normal pattern youll consider to
be anomalous
Use the AD model to detect anomalies in new data
Methods such as clustering for discovery can be helpful

2014 MapR Technologies 14


How hard is it to set an alert for anomalies?

Grey data is from normal events; xs are anomalies.


Where would you set the threshold?

2014 MapR Technologies 15


Basic idea:
Set adaptive thresholds

2014 MapR Technologies 16


What Are We Really Doing
We want action when something breaks
(dies/falls over/otherwise gets in trouble)
But action is expensive
So we dont want too many false alarms
And we dont want too many false negatives

Whats the right threshold to set for alerts?


We need to trade off costs

2014 MapR Technologies 17


A Second Look

2014 MapR Technologies 18


A Second Look

99.9%-ile

2014 MapR Technologies 19


New algorithm: t-digest

2014 MapR Technologies 20


How Hard Can it Be?

x x>t? Alarm !

t
Online
Summarizer
99.9%-ile

2014 MapR Technologies 21


Detecting Anomalies in Sporadic Events

20000
counts[order(centroids)]

15000
10000
5000
0

0.0 0.2 0.4 0.6 0.8 1.0

pnorm(centroids[order(centroids)])

2014 MapR Technologies 22


Using t-Digest
Apache Mahout uses t-digest as an on-line percentile estimator
very high accuracy for extreme tails
new in version Mahout v 0.9
t-digest also available elsewhere
in streamlib (open source library on github)
standalone (github and Maven Central)

Whats the big deal with anomaly detection?

This looks like a solved problem

2014 MapR Technologies 23


Already Done? Etsy Skyline?

2014 MapR Technologies 24


What About This?

offset + noise + pulse1 + pulse2

10
8
6 A

B
4
2
0
2

0 5 10 15

2014 MapR Technologies 25


Model Delta Anomaly Detection

+
>t? Alarm !
-
t

Online
Model
Summarizer 99.9%-ile

2014 MapR Technologies 26


The Real Inside Scoop
The model-delta anomaly detector is really just a sum of random
variables
the model we know about already
and a normally distributed error

The output (delta) is (roughly) the log probability of the sum


distribution (really 2)

Thinking about probability distributions is good

But how do you handle AD in systems with sporadic events?

2014 MapR Technologies 27


Spot the Anomaly

Anomaly?

2014 MapR Technologies 28


Maybe not!

2014 MapR Technologies 29


Wheres Waldo?

This is the real


anomaly

2014 MapR Technologies 30


Normal Isnt Just Normal
What we want is a model of what is normal

What doesnt fit the model is the anomaly

For simple signals, the model can be simple

x ~ N(0, e )
The real world is rarely so accommodating

2014 MapR Technologies 31


We Do Windows

2014 MapR Technologies 32


We Do Windows

2014 MapR Technologies 33


We Do Windows

2014 MapR Technologies 34


We Do Windows

2014 MapR Technologies 35


We Do Windows

2014 MapR Technologies 36


We Do Windows

2014 MapR Technologies 37


We Do Windows

2014 MapR Technologies 38


We Do Windows

2014 MapR Technologies 39


We Do Windows

2014 MapR Technologies 40


We Do Windows

2014 MapR Technologies 41


We Do Windows

2014 MapR Technologies 42


We Do Windows

2014 MapR Technologies 43


We Do Windows

2014 MapR Technologies 44


We Do Windows

2014 MapR Technologies 45


We Do Windows

2014 MapR Technologies 46


Windows on the World
The set of windowed signals is a nice model of our original signal
Clustering can find the prototypes
Fancier techniques available using sparse coding

The result is a dictionary of shapes


New signals can be encoded by shifting, scaling and adding
shapes from the dictionary

2014 MapR Technologies 47


Most Common Shapes (for EKG)

2014 MapR Technologies 48


Reconstructed signal

Original
signal
< 1 bit / sample
Reconstructed
signal

Reconstruction
error

2014 MapR Technologies 49


An Anomaly

Original technique for finding


1-d anomaly works against
reconstruction error

2014 MapR Technologies 50


Close-up of anomaly

Not what you want your


heart to do.

And not what the model


expects it to do.

2014 MapR Technologies 51


A Different Kind of Anomaly

2014 MapR Technologies 52


Model Delta Anomaly Detection

+
>t? Alarm !
-
t

Online
Model
Summarizer 99.9%-ile

2014 MapR Technologies 53


The Real Inside Scoop
The model-delta anomaly detector is really just a sum of random
variables
the model we know about already
and a normally distributed error

The output (delta) is (roughly) the log probability of the sum


distribution (really 2)

Thinking about probability distributions is good

2014 MapR Technologies 54


Anomalies among sporadic events

2014 MapR Technologies 55


Sporadic Web Traffic to an e-Business Site

Its important to know if traffic is stopped or


delayed because of a problem

But visits to site normally come at


varying intervals.

How long after the last event


should you begin to worry?

2014 MapR Technologies 56


Sporadic Web Traffic to an e-Business Site

Its important to know if traffic is stopped or


delayed because of a problem

But visits to site normally come at


varying intervals.

And how do you let your CEO


sleep through the night?

2014 MapR Technologies 57


Basic idea:
Time interval between events is how
to convert to something useful you
can measure

2014 MapR Technologies 58


Sporadic Events: Finding Normal and Anomalous Patterns
Time between intervals is much more usable than absolute times

Counts dont link as directly to probability models

Time interval is log

This is a big deal

2014 MapR Technologies 59


Event Stream (timing)
Events of various types arrive at irregular intervals
we can assume Poisson distribution

The key question is whether frequency has changed relative to


expected values
This shows up as a change in interval

Want alert as soon as possible

2014 MapR Technologies 60


Converting Event Times to Anomaly

99.9%-ile

99.99%-ile

2014 MapR Technologies 61


But in the real world, event
rates often change

2014 MapR Technologies 62


Time Intervals Are Key to Modeling Sporadic Events

2014 MapR Technologies 63


Model-Scaled Intervals Solve the Problem

2014 MapR Technologies 64


Model Delta Anomaly Detection

log p

+
>t? Alarm !
-
t

Online
Model
Summarizer 99.9%-ile

2014 MapR Technologies 65


Detecting Anomalies in Sporadic Events

ti Alarm
Incoming (t i - t i - n )
events
n > t

t
Rate
predictor t-digest
99.97%-ile

Rate
history

2014 MapR Technologies 66


Detecting Anomalies in Sporadic Events

ti Alarm
Incoming (t i - t i - n )
events
n > t

t
Rate
predictor t-digest
99.97%-ile

Rate
history

2014 MapR Technologies 67


Slipped Week: Simple Rate Predictor
Main Page Traffic

500
400
Hits (x 1000)

300
200
100
A B C D
0

Nov 02 Nov 07 Nov 12 Nov 17 Nov 22 Nov 27 Dec 02

Date

2014 MapR Technologies 68


Poisson Distribution
Time between events is exponentially distributed
- lt
Dt ~ l e
This means that long delays are exponentially rare
- lT
P(Dt > T ) = e
-log P(Dt > T ) = lT
If we know we can select a good threshold
or we can pick a threshold empirically

2014 MapR Technologies 69


Seasonality Poses a Challenge

Christmas Traffic

8
6
Hits / 1000

4
2
0

Nov 17 Nov 27 Dec 07 Dec 17 Dec 27

Date

2014 MapR Technologies 70


Something more is needed

Christmas Traffic

8
6
Hits / 1000

4
2
0

Nov 17 Nov 27 Dec 07 Dec 17 Dec 27

Date

2014 MapR Technologies 71


We need a better rate predictor

ti Alarm
Incoming (t i - t i - n )
events
n > t

t
Rate
predictor t-digest
99.97%-ile

Rate
history

2014 MapR Technologies 72


A New Rate Predictor for Sporadic Events

2014 MapR Technologies 73


Improved Prediction with Adaptive Modeling

Christmas Prediction

8
6
Hits (x 1000)

4
2
0

Dec 17 Dec 19 Dec 21 Dec 23 Dec 25 Dec 27 Dec 29

Date

2014 MapR Technologies 74


Anomaly Detection + Classification Useful Pair
Use the AD model to detect anomalies in new data
Methods such as clustering for discovery can be helpful
Once you have well-defined models in your system, you may
also want to use classification to tag those
Continue to use the AD model to find new anomalies

2014 MapR Technologies 75


Recap (out of order)
Anomaly detection is best done with a probability model
-log p is a good way to convert to anomaly measure
Adaptive quantile estimation (t-digest) works for auto-setting
thresholds

2014 MapR Technologies 76


Recap
Different systems require different models
Continuous time-series
sparse coding to build signal model
Events in time
rate model base on variable rate Poisson
segregated rate model
Events with labels
language modeling
hidden Markov models

2014 MapR Technologies 77


Why Use Anomaly Detection?

2014 MapR Technologies 78


Keep in mind

Model normal, then find


anomalies

t-digest for adaptive threshold

offset + noise + pulse1 + pulse2


A

10
Probabilistic models for 8
6
4 B

complex patterns
2
0
2

0 5 10 15

2014 MapR Technologies 79


Keep in mind
Time intervals are key for
sporadic events
Christmas Prediction

8
6
Hits (x 1000)
Complex time shift to predict

4
2
rate with seasonality

0
Dec 17 Dec 19 Dec 21 Dec 23 Dec 25 Dec 27 Dec 29

Date

Sequence of events reveals


phishing attack

2014 MapR Technologies 80


A New Look at Anomaly Detection
by Ted Dunning and Ellen Friedman June 2014 (published by OReilly)

e-book available courtesy of MapR

http://bit.ly/1jQ9QuL

2014 MapR Technologies 81


Coming in October: Time Series Databases
by Ted Dunning and Ellen Friedman Oct 2014 (published by OReilly)

2014 MapR Technologies 82


Thank you for coming today!

2014 MapR Technologies 83


MapR Technologies, confidential
2014 MapR Technologies 85
Sandbox

2014 MapR Technologies 86