Slide 11 - Anomaly Detection PDF

Anomaly(
detec-on(
Problem(
Text
Problem
Text
Text
mo-va-on(
Machine(Learning(
2
Anomaly(detec-on(example(
Aircra9(engine(features:( Dataset:(
(=(heat(generated( (
(=(vibra-on(intensity( New(engine:(
(((((…( (vibra-on)(
(heat)(
Andrew(Ng(
3
Density(es-ma-on(
Dataset:(
Is((((((((((((anomalous?(
Flag
anomaly
(vibra-on)(
anomaly
(heat)(
Andrew(Ng(
4
Fraud(detec-on:(
((((((((((((((=(features(of(user(((’s(ac-vi-es(
Model(((((((((from(data.(
Iden-fy(unusual(users(by(checking(which(have((
Manufacturing(
Monitoring(computers(in(a(data(center.(
(((((((=(features(of(machine((
(((((((=(memory(use,((((((((=(number(of(disk(accesses/sec,(
(((((((=(CPU(load,((((((((=(CPU(load/network(traffic.(
…(
(
Andrew(Ng(
Anomaly(
detec-on(
Gaussian(
distribu-on(
Machine(Learning(
6
Gaussian((Normal)(distribu-on(
Say((((((((((.(If((((is(a(distributed(Gaussian(with(mean((((,(variance((((((.(
Standard deviation
“distributed as”
Andrew(Ng(
7
Gaussian(distribu-on(example(
The area under the

curve, that is the
shaded area, will
always integrate to
one, that's the
property of probability
distributions and
Andrew(Ng(
8
Parameter(es-ma-on(
Dataset:(
So if you're given a data set like
this, maybe the estimation of
what Gaussian distribution the
data came from might be The data has a very high probability of
roughly the Gaussian being in the central region, and a low
distribution with “u” being the probability of being further out, even
center of the distribution, though probability of being further out,
“sigma” standing for the and so on. So maybe this is a reasonable
deviation controlling the width estimate of “u” and “sigma squared”.
of this Gaussian distribution.
Seems like a reasonable fit to
the data.
Andrew(Ng(
Anomaly(
detec-on(
Algorithm(
Machine(Learning(
10
The problem of estimating the distribution p(x) is sometimes

called the problem of density estimation
11
Anomaly(detec-on(algorithm(
1.  Choose(features((((((that(you(think(might(be(indica-ve(of(
anomalous(examples.(
2.  Fit(parameters(
(
(
(
(
3.  Given(new(example((((,(compute(((((((((:((
(
(((((((Anomaly(if((
Andrew(Ng(
12
13
Plot of p(x)
14
Ok
Anomaly
15
High probability
Low probability Andrew(Ng(

Anomaly(
detec-on(
Developing(and(
evalua-ng(an(anomaly(
detec-on(system(
Machine(Learning(
17
The(importance(of(real?number(evalua-on(
When(developing(a(learning(algorithm((choosing(features,(etc.),(
making(decisions(is(much(easier(if(we(have(a(way(of(evalua-ng(
our(learning(algorithm.(
Assume(we(have(some(labeled(data,(of(anomalous(and(nonX
anomalous(examples.(((((((((((((((if(normal,(((((((((((((if(anomalous).(
Training(set: ( ( ((((((((assume(normal(examples/not(
anomalous)(
Cross(valida-on(set:(
Test(set:(
Andrew(Ng(
18
AircraA(engines(mo-va-ng(example(
10000((good((normal)(engines(
20( (flawed(engines((anomalous)(
Training(set:(6000(good(engines(
CV:(2000(good(engines((( ((),(10(anomalous((((((((((((()(
Test:(2000(good(engines(((((((((((((),(10(anomalous((((((((((((()(
Alterna-ve:(Less recommended. The same set of CV and Test is not a good ML practice
Training(set:(6000(good(engines(
CV:(4000(good(engines((( ((),(10(anomalous((((((((((((()(
Test:(4000(good(engines(((((((((((((),(10(anomalous((((((((((((()(
Andrew(Ng(
19
Algorithm(evalua-on(
Fit(model((((((((((on(training(set(
On(a(cross(valida-on/test(example(((((,(predict(
Possible(evalua-on(metrics:(
(X(True(posi-ve,(false(posi-ve,(false(nega-ve,(true(nega-ve(
(X(Precision/Recall( These would be ways to evaluate an anomaly detection
algorithm on your cross validation set or on your test set.
(X(F1Xscore(
Can(also(use(cross(valida-on(set(to(choose(parameter((
Andrew(Ng(
20
Try many different values of epsilon, and then pick the value of epsilon that, let's
say, maximizes f1-score, or that otherwise does well on your cross validation set.
Anomaly(
detec-on(
Anomaly(detec-on(
vs.(supervised(
learning(
Machine(Learning(
22
Anomaly(detec-on( vs.( Supervised(learning(

Very(small(number(of(posi-ve( Large(number(of(posi-ve(and(
examples((((((((((((().((0X20(is( nega-ve(examples.(
common).( (
Large(number(of(nega-ve((((((((((((()( (
examples.( (
Many(different(“types”(of( Enough(posi-ve(examples(for(
anomalies.(Hard(for(any(algorithm( algorithm(to(get(a(sense(of(what(
to(learn(from(posi-ve(examples( posi-ve(examples(are(like,((future(
what(the(anomalies(look(like;( posi-ve(examples(likely(to(be(
future(anomalies(may(look(nothing( similar(to(ones(in(training(set.(
like(any(of(the(anomalous(
examples(we’ve(seen(so(far.(
Andrew(Ng(
23
Anomaly(detec-on( vs.( Supervised(learning(

•  Fraud(detec-on( •  Email(spam(classifica-on(
( (
•  Manufacturing((e.g.(aircra9( •  Weather(predic-on((sunny/
engines)( rainy/etc).(
( (
•  Monitoring(machines(in(a(data( •  Cancer(classifica-on(
center(
If you have equal numbers of positive and

negative examples, then we would tend to treat
all of these as supervisor problems.
Andrew(Ng(
24
For many other problems that are faced by various technology

companies and so on, we actually are in the settings where we have
very few or sometimes zero positive training examples.
There's just so many different types of anomalies that we've never

seen them before. And for those sorts of problems, very often the
algorithm that is used is an anomaly detection algorithm.
Anomaly(
detec-on(
Choosing(what(
features(to(use(
Machine(Learning(
26
This looks vaguely Gaussian
If this is what the data looks like, what often will

be done is playing with different
transformations of the data in order to make it
look more Gaussian. The algorithm will usually
work okay, even if you don't. But if you use
these transformations to make your data more
gaussian, it might work a bit better.
27
all of these are

examples of
parameters that
you can play with in
order to make your
data look a little bit
more Gaussian
35
28
So to summarize, if you plot a histogram with the

data, and find that it looks pretty non-Gaussian, it's
worth playing around a little bit with different
transformations like these, to see if you can make
your data look a little bit more Gaussian, before you
feed it to your learning algorithm.
36
29
Non?gaussian(features(
37
30
How do you come up with features for an anomaly detection

algorithm?
Similar to the error analysis for supervised learning.
You would train a complete algorithm, and run the
algorithm on a CV set, and look at the examples it
gets wrong, and see if you can come up with extra
features to help the algorithm do better on the
examples that it got wrong in the CV set.
40
31
33
Monitoring(computers(in(a(data(center(
Choose(features(that(might(take(on(unusually(large(or(
small(values(in(the(event(of(an(anomaly.(
(=(memory(use(of(computer(
(=(number(of(disk(accesses/sec(
(=(CPU(load(
(=(network(traffic(
(

Slide 11 - Anomaly Detection PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Slide 11 - Anomaly Detection PDF

Uploaded by

Copyright:

Available Formats

Anomaly(

The area under the

The problem of estimating the distribution p(x) is sometimes

Low probability Andrew(Ng(

Anomaly(detec-on( vs.( Supervised(learning(

Anomaly(detec-on( vs.( Supervised(learning(

If you have equal numbers of positive and

For many other problems that are faced by various technology

There's just so many different types of anomalies that we've never

This looks vaguely Gaussian

If this is what the data looks like, what often will

all of these are

So to summarize, if you plot a histogram with the

How do you come up with features for an anomaly detection

You might also like