0% found this document useful (0 votes)

152 views51 pages

Bayes Classifiers Overview by Ihler

The document discusses Naive Bayes classifiers. It explains that Naive Bayes classifiers estimate the probability of each class p(y) and the probability of features given each class p(x|y=c). It then uses Bayes rule to calculate the probability of each class given features p(y|x). For discrete features, this can be represented as a contingency table. However, when there are multiple discrete features, it requires estimating a joint probability distribution over all combinations of feature values, which can lead to overfitting when there is insufficient data.

Uploaded by

Suraj Patil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

152 views51 pages

Bayes Classifiers Overview by Ihler

Uploaded by

Suraj Patil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

+

Machine Learning and Data Mining

Bayes Classifiers

Prof. Alexander Ihler

A basic classiﬁer
•  Training data D={x(i),y(i)}, Classiﬁer f(x ; D)
–  Discrete feature vector x
–  f(x ; D) is a con@ngency table
•  Ex: credit ra@ng predic@on (bad/good)
–  X1 = income (low/med/high)
–  How can we make the most # of correct predic@ons?

Features # bad # good

X=0 42 15
X=1 338 287
X=2 3 5

(c) Alexander Ihler 2

–  Predict more likely outcome

for each possible observa@on Features # bad # good
X=0 42 15
X=1 338 287
X=2 3 5

(c) Alexander Ihler 3

–  Predict more likely outcome

for each possible observa@on Features # bad # good
X=0 .7368 .2632
–  Can normalize into probability:
p( y=good | X=c ) X=1 .5408 .4592
X=2 .3750 .6250
–  How to generalize?

(c) Alexander Ihler 4

Bayes rule
•  Two events: headache, flu
•  p(H) = 1/10
•  p(F) = 1/40
H
•  p(H|F) = 1/2
F

•  You wake up with a headache – what is the chance that

you have the flu?

Example from Andrew

Moore’s slides
Bayes rule
•  Two events: headache, flu
•  p(H) = 1/10
•  p(F) = 1/40
H
•  p(H|F) = 1/2
F
•  P(H & F) = ?

•  P(F|H) = ?

Example from Andrew

Moore’s slides
Bayes rule
•  Two events: headache, flu
•  p(H) = 1/10
•  p(F) = 1/40
H
•  p(H|F) = 1/2
F
•  P(H & F) = p(F) p(H|F)
= (1/2) * (1/40) = 1/80
•  P(F|H) = ?

Example from Andrew

Moore’s slides
Bayes rule
•  Two events: headache, flu
•  p(H) = 1/10
•  p(F) = 1/40
H
•  p(H|F) = 1/2
F
•  P(H & F) = p(F) p(H|F)
= (1/2) * (1/40) = 1/80
•  P(F|H) = p(H & F) / p(H)
= (1/80) / (1/10) = 1/8

Example from Andrew

Moore’s slides
Classification and probability
•  Suppose we want to model the data

•  Prior probability of each class, p(y)

–  E.g., fraction of applicants that have good credit
•  Distribution of features given the class, p(x | y=c)
–  How likely are we to see “x” in users with good credit?

•  Joint distribution

•  Bayes Rule:

(Use the rule of total probability

to calculate the denominator!)
Bayes classifiers
•  Learn “class conditional” models
–  Estimate a probability model for each class
•  Training data
–  Split by class
–  Dc = { x(j) : y(j) = c }
•  Estimate p(x | y=c) using Dc

•  For a discrete x, this recalculates the same table…

Features # bad # good p(x | y=0) p(x | y=1) p(y=0|x) p(y=1|x)

X=0 42 15 42 / 383 15 / 307 .7368 .2632
X=1 338 287 338 / 383 287 / 307 .5408 .4592
X=2 3 5 3 / 383 5 / 307 .3750 .6250

p(y) 383/690 307/690

Bayes classifiers
•  Learn “class conditional” models
–  Estimate a probability model for each class
•  Training data
–  Split by class
–  Dc = { x(j) : y(j) = c }
•  Estimate p(x | y=c) using Dc

•  For continuous x, can use any density estimate we like

–  Histogram
12
–  Gaussian 10
–  … 8
6
4
2
0
-3
-2 -1 0 1 2 3
Gaussian models
•  Estimate parameters of the Gaussians from the data

Feature x1 !
Multivariate Gaussian models
•  Similar to univariate case

µ = length-d column vector

§ = d x d matrix

|§| = matrix determinant

5
4
Maximum likelihood estimate:
3

-1

-2
-2 -1 0 1 2 3 4 5
Example: Gaussian Bayes for Iris Data
•  Fit Gaussian distribu@on to each class {0,1,2}

(c) Alexander Ihler 14

Machine Learning and Data Mining

Bayes Classifiers: Naïve Bayes

Prof. Alexander Ihler

Bayes classifiers
•  Estimate p(y) = [ p(y=0) , p(y=1) …]
•  Estimate p(x | y=c) for each class c
•  Calculate p(y=c | x) using Bayes rule
•  Choose the most likely class c

•  For a discrete x, can represent as a contingency table…

–  What about if we have more discrete features?

Features # bad # good p(x | y=0) p(x | y=1) p(y=0|x) p(y=1|x)

X=0 42 15 42 / 383 15 / 307 .7368 .2632
X=1 338 287 338 / 383 287 / 307 .5408 .4592
X=2 3 5 3 / 383 5 / 307 .3750 .6250

p(y) 383/690 307/690

Joint distributions
A B C
•  Make a truth table of all 0 0 0
combinations of values 0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
Joint distributions
A B C p(A,B,C | y=1)
•  Make a truth table of all 0 0 0 0.50
combinations of values 0 0 1 0.05
0 1 0 0.01
0 1 1 0.10
•  For each combination of values, 1 0 0 0.04
1 0 1 0.15
determine how probable it is
1 1 0 0.05
1 1 1 0.10

•  Total probability must sum to one

•  How many values did we specify?

Overfitting and density estimation
A B C p(A,B,C | y=1)
•  Estimate probabilities from the data 0 0 0 4/10
–  E.g., how many times (what fraction) 0 0 1 1/10
did each outcome occur? 0 1 0 0/10
0 1 1 0/10
1 0 0 1/10
•  M data << 2^N parameters?
1 0 1 2/10
1 1 0 1/10
•  What about the zeros? 1 1 1 1/10

–  We learn that certain combinations are impossible?

–  What if we see these later in test data?

•  Overfitting!
Overfitting and density estimation
A B C p(A,B,C | y=1)
•  Estimate probabilities from the data 0 0 0 4/10
–  E.g., how many times (what fraction) 0 0 1 1/10
did each outcome occur? 0 1 0 0/10
0 1 1 0/10
1 0 0 1/10
•  M data << 2^N parameters?
1 0 1 2/10
1 1 0 1/10
•  What about the zeros? 1 1 1 1/10

–  We learn that certain combinations are impossible?

–  What if we see these later in test data?

•  One option: regularize

•  Normalize to make sure values sum to one…
Overfitting and density estimation
•  Another option: reduce the model complexity
–  E.g., assume that features are independent of one another

•  Independence:
•  p(a,b) = p(a) p(b)

•  p(x1, x2, … xN | y=1) = p(x1 | y=1) p(x2 | y=1) … p(xN | y=1)

•  Only need to estimate each individually A B C p(A,B,C | y=1)
0 0 0 .4 * .7 * .1
0 0 1 .4 * .7 * .9
0 1 0 .4 * .3 * .1
A p(A |y=1) B p(B |y=1) C p(C |y=1)
0 1 1 …
0 .4 0 .7 0 .1
1 0 0
1 .6 1 .3 1 .9
1 0 1
1 1 0
1 1 1
Example: Naïve Bayes
Observed Data:
x1 x2 y
1 1 0
1 0 0
1 0 1
0 0 0
0 1 1
1 1 0
0 0 1
1 0 1

PredicIon given some observaIon x?

<
>

Decide class 0
(c) Alexander Ihler 22
Example: Naïve Bayes
Observed Data:
x1 x2 y
1 1 0
1 0 0
1 0 1
0 0 0
0 1 1
1 1 0
0 0 1
1 0 1

(c) Alexander Ihler 23

Example: Joint Bayes
Observed Data:
x1 x2 y
1 1 0
1 0 0
1 0 1
x1 x2 p(x | y=0) x1 x2 p(x | y=1)
0 0 0
0 0 1/4 0 0 1/4
0 1 1
0 1 0/4 0 1 1/4
1 1 0
1 0 1/4 1 0 2/4
0 0 1
1 1 2/4 1 1 0/4
1 0 1

(c) Alexander Ihler 24

Naïve Bayes models
•  Variable y to predict, e.g. “auto accident in next year?”
•  We have *many* co-observed vars x=[x1…xn]
–  Age, income, education, zip code, …

•  Want to learn p(y | x1…xn ), to predict y

–  Arbitrary distribution: O(dn) values!

•  Naïve Bayes:
–  p(y|x)= p(x|y) p(y) / p(x) ; p(x|y) = ∏ι p(xi|y)
–  Covariates are independent given “cause”

•  Note: may not be a good model of the data

–  Doesn’t capture correlations in x’s
–  Can’t capture some dependencies
•  But in practice it often does quite well!
Naïve Bayes models for spam
•  y 2 {spam, not spam}
•  X = observed words in email
–  Ex: [“the” … “probabilistic” … “lottery”…]
–  “1” if word appears; “0” if not

•  1000’s of possible words: 21000s parameters?

•  # of atoms in the universe: » 2270…

•  Model words given email type as independent

•  Some words more likely for spam (“lottery”)
•  Some more likely for real (“probabilistic”)
•  Only 1000’s of parameters now…
Naïve Bayes Gaussian models

¾211 0
x2 0 ¾222

Again, reduces the number of parameters of the model:

•  Maximum likelihood (empirical) estimators for

–  Discrete variables
–  Gaussian variables
–  Overfitting; simplifying assumptions or regularization
+

Machine Learning and Data Mining

Bayes Classifiers: Measuring Error

Prof. Alexander Ihler

A Bayes classiﬁer
•  Given training data, compute p( y=c| x) and choose largest
•  What’s the (training) error rate of this method?

Features # bad # good

X=0 42 15
X=1 338 287
X=2 3 5

(c) Alexander Ihler 30

A Bayes classiﬁer
•  Given training data, compute p( y=c| x) and choose largest
•  What’s the (training) error rate of this method?

Features # bad # good

Gets these examples wrong:
X=0 42 15 Pr[ error ] = (15 + 287 + 3) / (690)
X=1 338 287
X=2 3 5 (empirically on training data:
better to use test data)

(c) Alexander Ihler 31

Bayes Error Rate
•  Suppose that we knew the true probabili@es:

–  Observe any x:
(at any x)

–  Op@mal decision at that par@cular x is:

–  Error rate is:

= “Bayes error rate”
•  This is the best that any classiﬁer can do!
•  Measures fundamental hardness of separa@ng y-values given only features x

•  Note: conceptual only!

–  Probabili@es p(x,y) must be es@mated from data
–  Form of p(x,y) is not known and may be very complex
(c) Alexander Ihler 32
A Bayes classiﬁer
•  Bayes classification decision rule compares probabilities:
<
>
<
= >
•  Can visualize this nicely if x is a scalar:

p(x , y=0 )
Decision boundary
Shape: p(x | y=0 ) p(x , y=1 ) Shape: p(x | y=1 )
Area: p(y=0)
Area: p(y=1)

Feature x1 !
A Bayes classiﬁer Add mul@plier alpha:
<
•  Not all errors are created equally… >
•  Risk associated with each outcome?

p(x , y=0 ) Decision boundary

p(x , y=1 )

{
{ Type 1 errors: false posi@ves
Type 2 errors: false nega@ves

False positive rate: (# y=0, ŷ=1) / (#y=0)

False negative rate: (# y=1, ŷ=0) / (#y=1)
A Bayes classiﬁer Add mul@plier alpha:
<
•  Increase alpha: prefer class 0 >
•  Spam detection

p(x , y=0 ) Decision boundary

p(x , y=1 )

{
{
Type 1 errors: false posi@ves
Type 2 errors: false nega@ves

False positive rate: (# y=0, ŷ=1) / (#y=0)

False negative rate: (# y=1, ŷ=0) / (#y=1)
A Bayes classiﬁer Add mul@plier alpha:
<
•  Decrease alpha: prefer class 1 >
•  Cancer detection

p(x , y=0 ) Decision boundary

p(x , y=1 )
{
{ Type 1 errors: false posi@ves
Type 2 errors: false nega@ves

False positive rate: (# y=0, ŷ=1) / (#y=0)

False negative rate: (# y=1, ŷ=0) / (#y=1)
Measuring errors
•  Confusion matrix Predict 0 Predict 1
•  Can extend to more classes
Y=0 380 5
Y=1 338 3

•  True positive rate: #(y=1 , ŷ=1) / #(y=1) -- “sensitivity”

•  False negative rate: #(y=1 , ŷ=0) / #(y=1)
•  False positive rate: #(y=0 , ŷ=1) / #(y=0)
•  True negative rate: #(y=0 , ŷ=0) / #(y=0) -- “specificity”
Likelihood ra@o tests
•  Connec@on to classical, sta@s@cal decision theory:
< <
> = >
“log likelihood ratio”

•  Likelihood ra@o: rela@ve support for observa@on “x” under

“alterna@ve hypothesis” y=1, compared to “null hypothesis” y=0

•  Can vary the decision threshold: <

•  Classical tes@ng:
–  Choose gamma so that FPR is ﬁxed (“p-value”)
–  Given that y=0 is true, what’s the probability we decide y=1?

(c) Alexander Ihler 38

ROC Curves
•  Characterize performance as we vary the decision threshold?

Bayes classifier,
multiplier alpha Guess all 1
True positive rate

<
= sensitivity

>
Guess at random, proportion alpha

False positive rate

= 1 - specificity

Guess all 0
(c) Alexander Ihler 39
Probabilis@c vs. Discrimina@ve learning

“Discrimina@ve” learning: “Probabilis@c” learning:

Output predic@on ŷ(x) Output probability p(y|x)
(expresses conﬁdence in outcomes)
•  “Probabilis@c” learning
–  Condi@onal models just explain y: p(y|x)
–  Genera@ve models also explain x: p(x,y)
•  Oqen a component of unsupervised or semi-supervised learning
–  Bayes and Naïve Bayes classiﬁers are genera@ve models

(c) Alexander Ihler 40

Probabilis@c vs. Discrimina@ve learning

“Discrimina@ve” learning: “Probabilis@c” learning:

Output predic@on ŷ(x) Output probability p(y|x)
(expresses conﬁdence in outcomes)

•  Can use ROC curves for discrimina@ve models also:

–  Some no@on of confidence, but doesn’t correspond to a probability
–  In our code: “predictSoq” (vs. hard predic@on, “predict”)
>> learner = gaussianBayesClassify(X,Y); % build a classifier
>> Ysoq = predictSoq(learner, X); % N x C matrix of confidences
>> plotSoqClassify2D(learner,X,Y); % shaded confidence plot
(c) Alexander Ihler 41
ROC Curves
•  Characterize performance as we vary our confidence threshold?

Classifier B
Guess all 1
True positive rate
= sensitivity

Classifier A Guess at random, proportion alpha

False positive rate Reduce performance to one number?

= 1 - specificity
AUC = “area under the ROC curve”
Guess all 0 0.5 < AUC < 1
(c) Alexander Ihler 42
+

Machine Learning and Data Mining

Gaussian Bayes Classifiers

Prof. Alexander Ihler

Gaussian models
•  “Bayes optimal” decision
–  Choose most likely class
•  Decision boundary
–  Places where probabilities equal

•  What shape is the boundary?

Gaussian models
•  Bayes optimal decision boundary
–  p(y=0 | x) = p(y=1 | x)
–  Transition point between p(y=0|x) >/< p(y=1|x)
•  Assume Gaussian models with equal covariances
Gaussian example
•  Spherical covariance: § = ¾2 I
•  Decision rule

-1

-2
-2 -1 0 1 2 3 4 5
Non-spherical Gaussian distribu@ons
•  Equal covariances => still linear decision rule
–  May be “modulated” by variance direction
–  Scales; rotates (if correlated)

Ex:
2

Variance 1

[3 0 ]
[ 0 .25 ] 0

-1

-2
-10 -8 -6 -4 -2 0 2 4 6 8 10
Class posterior probabilities
•  Useful to also know class probabilities
•  Some notation
–  p(y=0) , p(y=1) – class prior probabilities
•  How likely is each class in general?
–  p(x | y=c) – class conditional probabilities
•  How likely are observations “x” in that class?
–  p(y=c | x) – class posterior probability
•  How likely is class c given an observation x?
Class posterior probabilities
•  Useful to also know class probabilities
•  Some notation
–  p(y=0) , p(y=1) – class prior probabilities
•  How likely is each class in general?
–  p(x | y=c) – class conditional probabilities
•  How likely are observations “x” in that class?
–  p(y=c | x) – class posterior probability
•  How likely is class c given an observation x?

•  We can compute posterior using Bayes’ rule

–  p(y=c | x) = p(x|y=c) p(y=c) / p(x)
•  Compute p(x) using sum rule / law of total prob.
–  p(x) = p(x|y=0) p(y=0) + p(x|y=1)p(y=1)
Class posterior probabilities
•  Consider comparing two classes
–  p(x | y=0) * p(y=0) vs p(x | y=1) * p(y=1)
–  Write probability of each class as
–  p(y=0 | x) = p(y=0, x) / p(x)
–  = p(y=0, x) / ( p(y=0,x) + p(y=1,x) )
–  = 1 / (1 + exp( -a ) ) (**)

–  a = log [ p(x|y=0) p(y=0) / p(x|y=1) p(y=1) ]

–  (**) called the logistic function, or logistic sigmoid.
Gaussian models
•  Return to Gaussian models with equal covariances

(**)

Now we also know that the probability of each class is given by:
p(y=0 | x) = Logistic( ** ) = Logistic( aT x + b )

We’ll see this form again soon…

Bayes Classifiers in Machine Learning
No ratings yet
Bayes Classifiers in Machine Learning
51 pages
Generative vs. Discriminative Models in AI
No ratings yet
Generative vs. Discriminative Models in AI
33 pages
05 Classification II 2024
No ratings yet
05 Classification II 2024
54 pages
Understanding Bayesian Classification Techniques
No ratings yet
Understanding Bayesian Classification Techniques
49 pages
Optimal Bayes Classifiers Explained
No ratings yet
Optimal Bayes Classifiers Explained
49 pages
ML 09 Naive Bayes Classifier
No ratings yet
ML 09 Naive Bayes Classifier
24 pages
CCS - Lec 5
No ratings yet
CCS - Lec 5
33 pages
Understanding Bayesian Reasoning in ML
No ratings yet
Understanding Bayesian Reasoning in ML
37 pages
Bayesian Learning and Decision Theory
No ratings yet
Bayesian Learning and Decision Theory
64 pages
ml3 - Text Classification - Naive Bayes
No ratings yet
ml3 - Text Classification - Naive Bayes
50 pages
Bayes Classifier
No ratings yet
Bayes Classifier
20 pages
Naïve Bayes and Probability Theory in ML
No ratings yet
Naïve Bayes and Probability Theory in ML
30 pages
Bayesian Classifiers Overview
No ratings yet
Bayesian Classifiers Overview
99 pages
Understanding Naïve Bayes Classifiers
No ratings yet
Understanding Naïve Bayes Classifiers
31 pages
Naïve Bayes Classification Overview
No ratings yet
Naïve Bayes Classification Overview
19 pages
Bayesian Classification, Nearest
No ratings yet
Bayesian Classification, Nearest
46 pages
Naive by
No ratings yet
Naive by
23 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
29 pages
Lecture 7
No ratings yet
Lecture 7
15 pages
Classification (Naive Bayes)
No ratings yet
Classification (Naive Bayes)
40 pages
Pgm5 With Output
No ratings yet
Pgm5 With Output
13 pages
Naive Bayes Classifiers in Machine Learning
No ratings yet
Naive Bayes Classifiers in Machine Learning
39 pages
Nayes Bayes Classifier
No ratings yet
Nayes Bayes Classifier
46 pages
Naive Bayes Classifier Explained
No ratings yet
Naive Bayes Classifier Explained
16 pages
WK 08
No ratings yet
WK 08
10 pages
Bayesian Decision Theory in ML
No ratings yet
Bayesian Decision Theory in ML
56 pages
IML Module 3
No ratings yet
IML Module 3
95 pages
Probability Theory in HCI Applications
No ratings yet
Probability Theory in HCI Applications
31 pages
Understanding Bayesian Networks Basics
No ratings yet
Understanding Bayesian Networks Basics
62 pages
Understanding Bayesian Classification Techniques
No ratings yet
Understanding Bayesian Classification Techniques
25 pages
Chapter 4
No ratings yet
Chapter 4
22 pages
Naïve Bayes as a Generative Model
No ratings yet
Naïve Bayes as a Generative Model
26 pages
Understanding Bayesian Classification Techniques
No ratings yet
Understanding Bayesian Classification Techniques
16 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
9 pages
Probabilistic Models in Machine Learning: Unit - III Chapter - 1
No ratings yet
Probabilistic Models in Machine Learning: Unit - III Chapter - 1
18 pages
Bayesian Learning Practical Exercises
No ratings yet
Bayesian Learning Practical Exercises
4 pages
06 - NaiveBayes and ME
No ratings yet
06 - NaiveBayes and ME
25 pages
Bayesian Classification Insights
No ratings yet
Bayesian Classification Insights
7 pages
Probabilistic Models in Machine Learning
100% (1)
Probabilistic Models in Machine Learning
18 pages
ML 05 Bayesian Classifier
No ratings yet
ML 05 Bayesian Classifier
19 pages
Understanding Bayes Theorem and Classifiers
No ratings yet
Understanding Bayes Theorem and Classifiers
26 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
26 pages
Data Mining - Bayesian Classification
No ratings yet
Data Mining - Bayesian Classification
6 pages
Data Mining Classification: Naïve Bayes Classifier Lecture Notes For Chapter 4 &5
No ratings yet
Data Mining Classification: Naïve Bayes Classifier Lecture Notes For Chapter 4 &5
26 pages
Naïve Bayes Classification Techniques
No ratings yet
Naïve Bayes Classification Techniques
36 pages
Naïve Bayes Classification
No ratings yet
Naïve Bayes Classification
21 pages
Naïve Bayes Classifier Explained
No ratings yet
Naïve Bayes Classifier Explained
65 pages
Naive Bayesian Classifier Overview
No ratings yet
Naive Bayesian Classifier Overview
6 pages
10 B Bayesian Learning 2
No ratings yet
10 B Bayesian Learning 2
14 pages
Bayes Classification Techniques Overview
No ratings yet
Bayes Classification Techniques Overview
30 pages
Bayes Classifier in Machine Learning
No ratings yet
Bayes Classifier in Machine Learning
16 pages
Data Mining - Module 7
No ratings yet
Data Mining - Module 7
8 pages
Understanding Probabilistic Classification
No ratings yet
Understanding Probabilistic Classification
25 pages
Understanding Bayesian Classifiers
No ratings yet
Understanding Bayesian Classifiers
58 pages
Naive Bayes Classifiers - Parta
No ratings yet
Naive Bayes Classifiers - Parta
17 pages
Slide07 Bayes
No ratings yet
Slide07 Bayes
51 pages
Naïve Bayes Classification Overview
No ratings yet
Naïve Bayes Classification Overview
19 pages
DBLog: Watermark-Based CDC Framework
No ratings yet
DBLog: Watermark-Based CDC Framework
6 pages
AI Use in India: Key Sectors Overview
100% (1)
AI Use in India: Key Sectors Overview
111 pages
CNN Architecture and Hyperparameters Guide
No ratings yet
CNN Architecture and Hyperparameters Guide
5 pages
2019 Container Workloads Trends
No ratings yet
2019 Container Workloads Trends
11 pages
2019 State of Devops Report Puppet Circleci Splunk - SML 1 1 PDF
No ratings yet
2019 State of Devops Report Puppet Circleci Splunk - SML 1 1 PDF
86 pages
Fairness Frameworks for AI Models
No ratings yet
Fairness Frameworks for AI Models
9 pages
Spark Scala 2.12 Release Issues
No ratings yet
Spark Scala 2.12 Release Issues
10 pages
Machine Learning Feature Engineering
No ratings yet
Machine Learning Feature Engineering
94 pages
Coe PDF
No ratings yet
Coe PDF
16 pages
Feature Engineering Techniques in Data Science
100% (2)
Feature Engineering Techniques in Data Science
76 pages
AI
No ratings yet
AI
32 pages
NoSQL Databases for Students
No ratings yet
NoSQL Databases for Students
35 pages
The American Healthcare System Explained - Suraj Patil - Medium
No ratings yet
The American Healthcare System Explained - Suraj Patil - Medium
12 pages
SEO Strategies for Document Optimization
No ratings yet
SEO Strategies for Document Optimization
52 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
Math 218 Quiz 2 Solutions Overview
No ratings yet
Math 218 Quiz 2 Solutions Overview
2 pages
UCT STA3030F Inferential Statistics Exam
No ratings yet
UCT STA3030F Inferential Statistics Exam
13 pages
Sample Size in Statistics (How To Find It) Excel Cochran's Formula General Tips
100% (1)
Sample Size in Statistics (How To Find It) Excel Cochran's Formula General Tips
5 pages
Chapter 13 Probability: Form 2 KSSM Mathematics
100% (1)
Chapter 13 Probability: Form 2 KSSM Mathematics
47 pages
Chapter 10 Two Sample Inferences
No ratings yet
Chapter 10 Two Sample Inferences
82 pages
Multi-Factor Categorical DOE Guide
No ratings yet
Multi-Factor Categorical DOE Guide
17 pages
A Practical Guide To Bootstrap in R
No ratings yet
A Practical Guide To Bootstrap in R
4 pages
Namma Kalvi 12th Maths Chapter 11 Notes em 220029
No ratings yet
Namma Kalvi 12th Maths Chapter 11 Notes em 220029
33 pages
Confidence Interval for Miami Airport Ratings
No ratings yet
Confidence Interval for Miami Airport Ratings
13 pages
Sample Size Calculator by Raosoft, Inc.
100% (2)
Sample Size Calculator by Raosoft, Inc.
1 page
Data Analysis and Probability Concepts
No ratings yet
Data Analysis and Probability Concepts
5 pages
Positive Thinking Reduces Student Stress
No ratings yet
Positive Thinking Reduces Student Stress
6 pages
Probability Basics
No ratings yet
Probability Basics
5 pages
Random Variables and Probability Distributions
No ratings yet
Random Variables and Probability Distributions
7 pages
954/3 STPM: One and Half Hours
No ratings yet
954/3 STPM: One and Half Hours
4 pages
Chapter 12
No ratings yet
Chapter 12
46 pages
Formula Sheet Descriptive Statistics
No ratings yet
Formula Sheet Descriptive Statistics
1 page
Credit Risk Modeling Bootcamp
100% (1)
Credit Risk Modeling Bootcamp
163 pages
Simple Linear Regression Analysis Guide
No ratings yet
Simple Linear Regression Analysis Guide
31 pages
Statistics and Probability Least Learned Competencies
No ratings yet
Statistics and Probability Least Learned Competencies
1 page
Normality Tests for Three Groups
No ratings yet
Normality Tests for Three Groups
21 pages
Bayesian Nonparametrics Lecture Notes
No ratings yet
Bayesian Nonparametrics Lecture Notes
108 pages
Multiple Regression Analysis Overview
No ratings yet
Multiple Regression Analysis Overview
1 page
Chapter Review 6
No ratings yet
Chapter Review 6
31 pages
A General Guide To Stat Analysis
No ratings yet
A General Guide To Stat Analysis
1 page
517 (Norets, Princeton) PDF
No ratings yet
517 (Norets, Princeton) PDF
2 pages
Chapter 4 Statistical Process Control
100% (1)
Chapter 4 Statistical Process Control
20 pages
Result of Pilot Test: Reliability PBI MOTHER
No ratings yet
Result of Pilot Test: Reliability PBI MOTHER
14 pages
QA Syllabus
No ratings yet
QA Syllabus
9 pages

Bayes Classifiers Overview by Ihler

Uploaded by

Bayes Classifiers Overview by Ihler

Uploaded by

+

Machine Learning and Data Mining

Prof. Alexander Ihler

Features # bad # good

(c) Alexander Ihler 2

– Predict more likely outcome

(c) Alexander Ihler 3

– Predict more likely outcome

• You wake up with a headache – what is the chance that

Example from Andrew

Example from Andrew

Example from Andrew

Example from Andrew

• Prior probability of each class, p(y)

(Use the rule of total probability

• For a discrete x, this recalculates the same table…

Features # bad # good p(x | y=0) p(x | y=1) p(y=0|x) p(y=1|x)

p(y) 383/690 307/690

• For continuous x, can use any density estimate we like

µ = length-d column vector

|§| = matrix determinant

(c) Alexander Ihler 14

Machine Learning and Data Mining

Bayes Classifiers: Naïve Bayes

Prof. Alexander Ihler

• For a discrete x, can represent as a contingency table…

Features # bad # good p(x | y=0) p(x | y=1) p(y=0|x) p(y=1|x)

p(y) 383/690 307/690

• Total probability must sum to one

• How many values did we specify?

– We learn that certain combinations are impossible?

– We learn that certain combinations are impossible?

• One option: regularize

• p(x1, x2, … xN | y=1) = p(x1 | y=1) p(x2 | y=1) … p(xN | y=1)

PredicIon given some observaIon x?

(c) Alexander Ihler 23

(c) Alexander Ihler 24

• Want to learn p(y | x1…xn ), to predict y

• Note: may not be a good model of the data

• 1000’s of possible words: 21000s parameters?

• Model words given email type as independent

Again, reduces the number of parameters of the model:

• Maximum likelihood (empirical) estimators for

Machine Learning and Data Mining

Bayes Classifiers: Measuring Error

Prof. Alexander Ihler

Features # bad # good

(c) Alexander Ihler 30

Features # bad # good

(c) Alexander Ihler 31

– Op@mal decision at that par@cular x is:

– Error rate is:

• Note: conceptual only!

p(x , y=0 ) Decision boundary

False positive rate: (# y=0, ŷ=1) / (#y=0)

p(x , y=0 ) Decision boundary

False positive rate: (# y=0, ŷ=1) / (#y=0)

p(x , y=0 ) Decision boundary

False positive rate: (# y=0, ŷ=1) / (#y=0)

• True positive rate: #(y=1 , ŷ=1) / #(y=1) -- “sensitivity”

• Likelihood ra@o: rela@ve support for observa@on “x” under

• Can vary the decision threshold: <

(c) Alexander Ihler 38

False positive rate

“Discrimina@ve” learning: “Probabilis@c” learning:

(c) Alexander Ihler 40

“Discrimina@ve” learning: “Probabilis@c” learning:

• Can use ROC curves for discrimina@ve models also:

Classifier A Guess at random, proportion alpha

False positive rate Reduce performance to one number?

Machine Learning and Data Mining

Gaussian Bayes Classifiers

Prof. Alexander Ihler

• What shape is the boundary?

• We can compute posterior using Bayes’ rule

– a = log [ p(x|y=0) p(y=0) / p(x|y=1) p(y=1) ]

–  Predict more likely outcome

–  Predict more likely outcome

•  You wake up with a headache – what is the chance that

•  Prior probability of each class, p(y)

•  For a discrete x, this recalculates the same table…

•  For continuous x, can use any density estimate we like

•  For a discrete x, can represent as a contingency table…

•  Total probability must sum to one

•  How many values did we specify?

–  We learn that certain combinations are impossible?

–  We learn that certain combinations are impossible?

•  One option: regularize

•  p(x1, x2, … xN | y=1) = p(x1 | y=1) p(x2 | y=1) … p(xN | y=1)

•  Want to learn p(y | x1…xn ), to predict y

•  Note: may not be a good model of the data

•  1000’s of possible words: 21000s parameters?

•  Model words given email type as independent

•  Maximum likelihood (empirical) estimators for

–  Op@mal decision at that par@cular x is:

–  Error rate is:

•  Note: conceptual only!

•  True positive rate: #(y=1 , ŷ=1) / #(y=1) -- “sensitivity”

•  Likelihood ra@o: rela@ve support for observa@on “x” under

•  Can vary the decision threshold: <

•  Can use ROC curves for discrimina@ve models also:

•  What shape is the boundary?

•  We can compute posterior using Bayes’ rule

–  a = log [ p(x|y=0) p(y=0) / p(x|y=1) p(y=1) ]