Professional Documents
Culture Documents
(_hx“)
Lecture 3: Feasibility of Learning
Hsuan-Tien Lin (ó“0)
htlin@csie.ntu.edu.tw
Roadmap
1 When Can Machines Learn?
A Learning Puzzle
batchedsupervisedbin.classification.XFLO.li
10,1 0,1 0,1 )
,
,
Xi Xz Xs yn = 1
✗4 X5 Xb
yn = +1
Maybenodefiniteans _
itdependsonwhathypotneses
wechooiei g(x) = ?
yn = 1
g(x) = ?
yn = +1
yn = 1
g(x) = ?
yn = +1
A Brain-Storming Problem
10 14
(5, 3, 2) ! 151022, (7, 2, 5) ! ? 14 3549
151026 143547
g(x) = 151012 + x1 + x2 + x3 g(x) = x1 · x2 · 10000
+ x1 · x3 · 100
+ (x1 · x2 + x1 · x3 x2 )
1,4,1,5,0,-1,1,6
by yt = yt 4 yt 2
1,4,1,5,1,6,1,7
by yt = yt 2 + Jt is evenK
1,4,1,5,2,9,3,14
by yt = yt 4 + yt 2
Infeasibility of Learning
x y g f1 f2 f3 f4 f5 f6 f7 f8
000
001 ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ ⇥
D 010
011
⇥ ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ ⇥
100 ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ ⇥
101 ? ⇥ ⇥ ⇥ ⇥
110 ? ⇥ ⇥ ⇥ ⇥
111 ? ⇥ ⇥ ⇥ ⇥
• g ⇡ f inside D: sure!
• g ⇡ f outside D: No! (but that’s really what we want!)
no algorithm is best
for all learning problems
Hsuan-Tien Lin (NTU CSIE) Machine Learning 10/35
Feasibility of Learning Learning is Impossible?
Questions?
sample
bin
bin sample bottom
No! top
sample
possibly not: sample can be mostly
top
Yes!
probably yes: in-sample ⌫ likely close
to unknown µ
bin
bottom
sample of size N
µ = orange ⌫ = orange
probability in bin fraction in sample
bin
• in big sample (N large), ⌫ is probably close to µ (within ✏) bottom
⇥ ⇤ ⇣ ⌘ bottom
the statement ‘⌫ = µ’ is
probably approximately correct (PAC)
Hsuan-Tien Lin (NTU CSIE) Machine Learning 15/35
Feasibility of Learning Probability to the Rescue
i. 知 0 代表 bad ⼩
sth.ba dhappened
,
,
sample of size N
• does not depend on µ,
no need to ‘know’ µ
• larger sample size N or
looser gap ✏
=) higher probability for ‘⌫ ⇡ µ’
bin
bottom
Questions?
prediction 和 idealvalue 的 差異 ,
'
• orange • PM • h is wrong , h(x) 6= f (x) ; 代表有 多少 ⼈ 是
nomunitorm
錯誤 的
• green • • h is right , h(x) = f (x)
。
f (xn )
of i.i.d. marbles with i.i.d. xn
若 假設 unifomisamplingacrossmarbles
" top
,
• h(x) 6= f (x)
hypothesis set
fixed h
H
N
1 P
by known Ein (h) = N Jh(xn ) 6= yn K ⽤ ⼿ 上 data ,
已知 n=1 是否 相同 ,
Verification of One h
for any fixed h, when data large enough,
Ein (h) ⇡ Eout (h) 若 只有 -5 h …
real learning:
A shall make choices 2 H (like PLA) 應該 要 可以選擇 .
one
hypothesis
h
Questions?
种 coloring ⽅法 衬 应 个 bin
Multiple h
⼀
枷
⼀
-
, ,
top
h1 h2 hM
........
Coin Game
........
A: No. Even if all coins are fair, the probability that one of the coins
31 400
results in 5 heads is 1 32 > 99%.
badsampleprobabilityissmall.hutifmorebinsmaygetworse ,
h
D1
BAD cnet.si
D2 ... D1126 ... D5678
BAD
... Hoeffding
PD [BAD D for h] . . .
想像你拿 到 baddataset 的
机会 是
⼩ 的 ,
Hoeffding: small
X
PD [BAD D] = P(D) · JBAD DK
all possibleD
() there exists no BAD h such that Eout (h) and Ein (h) far away
there exists some h such that Eout (h) and Ein (h) far away
() BAD data for many h
1
Good
h2 BAD Good PD [BAD D for h2 ] . . .
h3 BAD BAD powdrful ! BAD PD [BAD D for h3 ] . . .
!
...
hM BAD Good , BAD PD [BAD D for hM ] . . .
finite
all BAD BAD GOOD BAD ?
Bound BAD probability
, 確保 small
GOOD i
Canbeusedforanyhypothesis ,
worstcase.Misfinite.no
(union bound)
⇣ ⌘ ⇣ ⌘ ⇣ ⌘
2 exp 2✏2 N + 2 exp 2✏2 N + . . . + 2 exp 2✏2 N
⇣ ⌘
= 2M exp 2✏2 N ttunable ,
d
h 的 取量
Questions?
hypothesis set
H
Eout (g) |{z}
⇡ Ein (g) |{z}
⇡ 0
(set of candidate formula)
test train
Trade-off on M
1 can we make sure that Eout (g) is close enough to Ein (g)?
2 can we make Ein (g) small enough?
small M large M
1 Yes!, 1 No!,
P[BAD] 2 · M · exp(. . .) P[BAD] 2 · M · exp(. . .)
2 No!, too few choices 2 Yes!, many choices
Preview
Known
⇥ ⇤ ⇣ ⌘
2
P Ein (g) Eout (g) > ✏ 2 · M · exp 2✏ N
Todo
• establish a finite quantity that replaces M
⇥ ⇤ ? ⇣ ⌘
P Ein (g) Eout (g) > ✏ 2 · mH · exp 2✏2 N
• justify the feasibility of learning for infinite M
• study mH to understand its trade-off for ‘right’ H, just like M
Questions?
Summary
1 When Can Machines Learn?