Machine PAC algorithm

© All Rights Reserved

20 views

Machine PAC algorithm

© All Rights Reserved

- Provable 8
- Data Structures and Algorithms MCQ -2
- Singer and Instrument Recognition from video song using Supporting Vector Machine
- Games, Puzzles, and Computation
- Vl9221 Lesson Plan
- Oommen Thomas
- Analysis of algorithms-Day3
- CreateSpace.Big.Nov.2015.ISBN.1519559720.pdf
- comb pap d
- Alpha Eritrean Engineers Magazine 2017 February Issue-New
- Alpha Eritrean Engineers Magazine 2017 August Issue
- Handbook of Brain Theory and Neural Network
- lecture1a
- The Finiteness of Algorithms | transmediale
- Machine Learning in the Elastic Stack Elasticon 2018
- 10.1.1.105.8050
- John Snow Labs, Turi and Alderley.ai Partnership Showcases Turnkey Productivity of Data Science on a Massive Healthcare Dataset
- Orange
- Scimakelatex.10230.None
- sysml-whitepaper

You are on page 1of 36

Lecture 2

Mehryar Mohri

Courant Institute and Google Research

mohri@cims.nyu.edu

PAC Model,

Guarantees for Learning with

Finite Hypothesis Sets

Motivation

Some computational learning questions

What can be learned efficiently?

What is inherently hard to learn?

A general model of learning?

Complexity

Computational complexity: time and space.

Sample complexity: amount of training data

needed to learn successfully.

Mistake bounds: number of mistakes before

learning successfully.

page 3

This lecture

PAC Model

Sample complexity, finite H, consistent case

Sample complexity, finite H, inconsistent case

page 4

X : set of all possible instances or examples, e.g.,

their height and weight.

identified with its support {x X: c(x) = 1}.

c: X

D : target distribution, a fixed probability

distribution over X. Training and test examples are

drawn according to D.

Mehryar Mohri - Foundations of Machine Learning

page 5

S : training sample.

H : set of concept hypotheses, e.g., the set of all

linear classifiers.

selects a hypothesis hS from H approximating c .

page 6

Errors

True error or generalization error of h with

respect to the target concept c and distribution D :

!

x D

x D

sample S drawn according to distribution D,

1

RS (h) = Pr [h(x) = c(x)] = E [1h(x)=c(x)] =

b

b

m

x D

x D

Note: R(h) = S

E m RS (h) .

D

page 7

1h(xi )=c(xi ) .

i=1

PAC Model

(Valiant, 1984)

learning.

Definition: concept class C is PAC-learnable if there

exists a learning algorithm L such that:

for all c C, > 0, > 0, and all distributions D,

Pr m [R(hS )

S D

fixed polynomial.

page 8

Remarks

Concept class C is known to the algorithm.

Distribution-free model: no assumption on D .

Both training and test examples drawn D.

Probably: confidence 1

What about the cost of the representation of c C?

Mehryar Mohri - Foundations of Machine Learning

page 9

Computational representation:

cost for x X in O(n).

cost for c C in O(size(c)) .

O(poly(1/, 1/ )) O(poly(1/, 1/ , n, size(c))).

page 10

Problem: learn unknown axis-aligned rectangle R

using as small a labeled sample as possible.

R

R

Hypothesis: rectangle R. In general, there may be

false positive and false negative points.

Mehryar Mohri - Foundations of Machine Learning

page 11

Simple method: choose tightest consistent

rectangle R for a large enough sample. How large

a sample? Is this class PAC-learnable?

R

R

Mehryar Mohri - Foundations of Machine Learning

page 12

Fix > 0 and assume Pr[R] > (otherwise the result

D

is trivial).

Let r1 , r2 , r3 , r4 be four smallest rectangles along

the sides of R such that Pr[ri ] 4 .

D

r1

R = [l, r] [b, t]

r4 = [l, s4 ] [b, t]

s4 = inf{s : Pr [l, s] [b, t]

R

r2

r4

r3

Mehryar Mohri - Foundations of Machine Learning

D

page 13

4}

Errors can only occur in R R . Thus (geometry),

R misses at least one region ri .

R(R ) >

Pr[

4

i=1

4(1

4

i=1 {R

misses ri }]

Pr[{R misses ri }]

4)

4e

m

4

r1

R

r2

r4

r3

Mehryar Mohri - Foundations of Machine Learning

R

page 14

Set > 0 to match the upper bound:

4e

Then, for m

m

4

log 4 .

R(R )

r1

R

r2

r4

r3

Mehryar Mohri - Foundations of Machine Learning

R

page 15

Notes

Infinite hypothesis set, but simple proof.

Does this proof readily apply to other similar

concepts classes?

Geometric properties:

key in this proof.

in general non-trivial to extend to other classes,

e.g., non-concentric circles (see HW2, 2006).

page 16

This lecture

PAC Model

Sample complexity, finite H, consistent case

Sample complexity, finite H, inconsistent case

page 17

Consistent Case

Theorem: let H be a finite set of functions from X

to {0, 1} and L an algorithm that for any target

concept c H and sample S returns a consistent

hypothesis hS : R(hS ) = 0. Then, for any > 0 , with

probability at least 1 ,

R(hS )

1

m (log |H|

+ log 1 ).

page 18

Consistent Case

Proof: Fix h H , then

Thus,

= Pr[(h1 H consistent R(h1 ) > e)

Pr[h consistent

R(h) > ]

(h|H|

(1

)m .

H consistent

(union bound)

h H

h H

(1

h H

)m = |H|(1

)m

|H|e

R(h|H| ) > )]

page 19

Remarks

The algorithm can be ERM if problem realizable.

Error bound linear in m1 and only logarithmic in 1 .

log2 |H| is the number of bits used for the

representation of H.

page 20

Example for n = 6.

Algorithm: start with x1 x1 xn xn and rule

out literals incompatible with positive examples.

0

x1

x2

page 21

x5

x6 .

Problem: learning class Cn of conjunctions of

boolean literals with at most n variables (e.g.,

for n = 3, x1 x2 x3 ).

Algorithm: choose h consistent with S .

Since |H| = |Cn | = 3n, sample complexity:

1

((log 3) n +

log 1 ).

algorithmic cost per training example is

in O(n).

page 22

This lecture

PAC Model

Sample complexity, finite H, consistent case

Sample complexity, finite H, inconsistent case

page 23

Inconsistent Case

No h H is a consistent hypothesis.

The typical case in practice: difficult problems,

complex concept class.

But, inconsistent hypotheses with a small number

of errors on the training set can be useful.

Need a more powerful tool: Hoeffdings inequality.

page 24

Hoeffdings Inequality

Corollary: for any > 0 and any hypothesis h : X

the following inequalities holds:

Pr[R(h)

Pr[R(h)

R(h)

R(h)

2m

2m

2

2

Pr[|R(h)

R(h)|

2e

2m

page 25

{0, 1}

Can we apply that bound to the hypothesis hS

returned by our learning algorithm when training

on sample S ?

No, because hS is not a fixed hypothesis, it depends

on the training sample. Note also that E[R(hS )]

is not a simple quantity such as R(hS ).

Instead, we need a bound that holds simultaneously

for all hypotheses h H , a uniform convergence

bound.

page 26

Theorem: let H be a finite hypothesis set, then, for

any > 0 , with probability at least 1 ,

h

H, R(h)

RS (h) +

.

2m

Pr max R(h)

RS (h) >

h H

= Pr R(h1 )

RS (h1 ) >

Pr R(h)

...

R(h|H| )

RS (h|H| ) >

RS (h) >

h H

2|H| exp( 2m 2 ).

Mehryar Mohri - Foundations of Machine Learning

page 27

Remarks

Thus, for a finite hypothesis set, whp,

h

H, R(h)

Error bound in O(

RS (h) + O

1

)

m

log |H|

.

m

(quadratically worse).

needed to encode H.

Occam): plurality should not be posited without

necessity.

Mehryar Mohri - Foundations of Machine Learning

page 28

Occams Razor

Principle formulated by controversial theologian

William of Occam: plurality should not be posited

without necessity, rephrased as the simplest

explanation is best;

invoked in a variety of contexts, e.g., syntax.

Kolmogorov complexity can be viewed as the

corresponding framework in information theory.

here, to minimize true error, choose the most

parsimonious explanation (smallest |H|).

we will see later other applications of this

principle.

page 29

Lecture Summary

C is PAC-learnable if L, c C, , > 0, m = P

Pr m [R(hS )

] 1

.

S D

R(h)

1

m (log |H|

+ log 1 ).

R(h)

RS (h) +

log |H|+log

2m

page 30

1 1

,

References

Anselm Blumer, A. Ehrenfeucht, David Haussler, and Manfred K. Warmuth. Learnability and

the Vapnik-Chervonenkis dimension. Journal of the ACM (JACM),Volume 36, Issue 4, 1989.

Michael Kearns and Umesh Vazirani. An Introduction to Computational Learning Theory, MIT

Press, 1994.

(1984).

page 31

Appendix

Problem: each x X defined by n boolean features.

Let C be the set of all subsets of X.

Question: is C PAC-learnable?

Sample complexity: H must contain C . Thus,

|H| |C| = 2

The bound

(2n )

1

1

n

gives m = ((log 2) 2 + log ).

requires an exponential sample size.

Mehryar Mohri - Foundations of Machine Learning

page 33

Definition: expressions of the form T1 Tk with

each term Ti conjunctions of boolean literals with

at most n variables.

Problem: learning k-term DNF formulae.

Sample complexity:|H| = |C| = 3nk . Thus, polynomial

sample complexity

1 ((log 3) nk + log 1 ).

Time complexity: intractable if RP = N P : the class

is then not efficiently PAC-learnable (proof by

reduction from graph 3-coloring). But, a strictly

larger class is!

Mehryar Mohri - Foundations of Machine Learning

page 34

k-CNF Expressions

Definition: expressions T1 Tj of arbitrary

length j with each term Ti a disjunction of at most k

boolean attributes.

Algorithm: reduce problem to that of learning

conjunctions of boolean literals. (2n)k new variables:

(u1 , . . . , uk )

Yu1 ,...,uk .

effect of the transformation on the distribution

is not an issue: PAC-learning allows any

distribution D .

page 35

k-CNF Expressions

Observation: any k-term DNF formula can be

written as a k-CNF expression. By associativity,

k

ui,1

i=1

ui,ni =

u1,j1

j1 [1,n1 ],...,jk [1,nk ]

Example:

(u1 u2 u3) (v1 v2 v3 ) = 3i,j=1 (ui vj ).

But, in general converting a k-CNF (equiv. to a

k-term DNF) to a k-term DNF is intractable.

cost of representation of concept c .

choice of hypothesis set H.

uk,jk .

page 36

- Provable 8Uploaded byMuhammad Rizwan
- Data Structures and Algorithms MCQ -2Uploaded byKuldeep Singh
- Singer and Instrument Recognition from video song using Supporting Vector MachineUploaded byJournal of Computing
- Games, Puzzles, and ComputationUploaded byJames Matthew Miraflor
- Vl9221 Lesson PlanUploaded byshankar
- Analysis of algorithms-Day3Uploaded byAnil Reddy Manukonda
- CreateSpace.Big.Nov.2015.ISBN.1519559720.pdfUploaded byopenid_APAKwdRc
- Alpha Eritrean Engineers Magazine 2017 February Issue-NewUploaded bymeharigrw
- Oommen ThomasUploaded byJamhur Sudjana
- comb pap dUploaded byapi-376970706
- Alpha Eritrean Engineers Magazine 2017 August IssueUploaded bymeharigrw
- Handbook of Brain Theory and Neural NetworkUploaded byrhycardo5902
- lecture1aUploaded byDamir Ramazanov
- The Finiteness of Algorithms | transmedialeUploaded byNancy Jones
- Machine Learning in the Elastic Stack Elasticon 2018Uploaded bymrme44
- 10.1.1.105.8050Uploaded byFatihah Roslan
- John Snow Labs, Turi and Alderley.ai Partnership Showcases Turnkey Productivity of Data Science on a Massive Healthcare DatasetUploaded byJohn Snow Labs
- OrangeUploaded byjamesbor
- Scimakelatex.10230.NoneUploaded bysydfo
- sysml-whitepaperUploaded bySapankpatel
- (Synthesis Lectures on Image, Video, And Multimedia Processing) Phillipos Mordohai-Tensor Voting - A Perceptual Organization Approach to Computer Vision and Machine Learning-Morgan and Claypool PublisUploaded bydimitar55555
- Musings on Computational ComplexityUploaded byryubahamut
- PPP Application Grosman JeremyUploaded byChou Fleur
- Genetic AlgorithmsUploaded bySrinivas
- A Linear-time Component-labeling Algorithm Using Contour Tracing TechniqueUploaded byFederico Di Mattia
- End-To-End Network QoS via Scheduling of FlexibleUploaded byRobertt Sánchez
- 12. Khalid Manzoor - DATA MINING FOR A WEB-BASED EDUCATIONAL SYSTEM.docxUploaded byUnfal Yasin
- ReferenceUploaded bysubashkumi
- ResumeUploaded byRaviprakash Tripathy
- supplMaterial-aamas16.pdfUploaded byAnonymous jwh5qFGD

- RC4 StallingsUploaded bymwroulimou
- Virginmobiles Pricingfortheveryfirsttime 140402014950 Phpapp01Uploaded byanahita_agarwal
- Mi Mof or LteUploaded bybelitino
- Qualcomm Hexagon ArchitectureUploaded byRahul Sharma
- ENTS622_SampleTestUploaded bychandandesur
- 5G a Technology VisionUploaded byvphuc1984
- EFT for Student Payment UMDUploaded byRahul Sharma
- eulaUploaded byRahul Sharma
- ABCUploaded byRahul Sharma
- SCBP Quick Start GuideUploaded byRahul Sharma
- ABCUploaded byRahul Sharma

- Impact of Artificial Intelligence in E-Commerce IndustryUploaded byyash
- Intro MIS LecUploaded bySahar F Sabbeh
- Kamatech AcceleratorUploaded byLou Kerner
- CheatUploaded byDebayan Bairagi
- Alan Turing - The Turing TestUploaded byIngrid Linhares
- Vidit JainUploaded bydasnbdsb
- final presentation for ism fall semesterUploaded byapi-431864467
- Baby-Snakes ModelUploaded bymedijum
- Academic projects RoboticsUploaded byjagannnn
- Scientific Cognition as Distributed Cognition, Ronald GiereUploaded byaltemio
- Project listUploaded byArul Jothi
- Machine LearningUploaded byAmimul Ihsan
- IJETAE_1012_48.pdfUploaded byYogesh Pandey
- Recognition of persisting emotional valence from EEG using convolutional neural networks.pdfUploaded byMitchell Angel Gomez Ortega
- Artifical intelligence (AI)Uploaded byanand_54
- Bioinspired-Computation-in-Artificial-Systems-International-Work-Conference-on-the-Interplay-Between-Natural-and-Artificial-Computation-IWINAC-2015-Elche-Spain-June-1-5-2015-Proceedings-Part-II.pdfUploaded byvctorvargas9383
- TCSUploaded bySaqib Kamal
- 2008Analysis of the Ageing Impact on the Strength of TheUploaded bySanjiv Kumar Singh
- Fast Fourier transform for nonequispaced data : A tutorialUploaded byIzemAmazigh
- Survey on the Utilization of Artificial Intelligence in Remote SensingUploaded byeditor3854
- Automotive Software in the Age of Autonomous DrivingUploaded byDejan Ognjanovic
- Lec01-CourseIntroductionUploaded byTuyen Tuyen
- Real-Time Markerless Square-ROI Recognition based on Contour-Corner for Breast AugmentationUploaded byJournal of Computing
- ai projectUploaded byapi-237175160
- Robotic Process AutomationUploaded bySamir Nweery
- El 31908912Uploaded byAnonymous 7VPPkWS8O
- AO-Quick Start GuideUploaded byMartin Warburton
- Medibot Correct SynopsisUploaded byumar farooq
- Engaging Virtual Communities in Appreciative Innovation by C HoltUploaded bycholtresearch
- APC - September 2018 AUUploaded byJose Gonzalez