Professional Documents
Culture Documents
(_hx“)
Lecture 1: Basics of Machine Learning
Hsuan-Tien Lin (ó“0)
htlin@csie.ntu.edu.tw
Roadmap
1 When Can Machines Learn?
data ML skill
What is skill?
improved
data ML performance
measure
improved
data ML performance
measure
Questions?
ML Applications: Medicine
data ML skill
By JulianVilla26;
licensed under CC BY-SA 4.0 via Wikimedia Commons
By Raimond Spekking;
licensed under CC BY-SA 4.0 via Wikimedia Commons
original picture by F.U.S.I.A. assistant and derivative work by Sylenius via Wikimedia Commons
face recognition
• data: faces and non-faces
• skill: predict which boxes contain faces
A Possible ML Solution
answer correctly ⇡ Jrecent strength of student > difficulty of questionK
• give ML 9 million records from 3000 students
• ML determines (reverse-engineers) strength and difficulty
automatically
A Hot Problem
• competition held by Netflix in 2006
• 100,480,507 ratings that 480,189 users gave to 17,770 movies
• 10% improvement = 1 million dollar prize
• similar competition (movies ! songs) held by Yahoo! in KDDCup
2011
• 252,800,275 ratings that 1,000,990 users gave to 624,961 songs
? us
ter
s ?
u ise
? A Possible ML Solution
dy n? ckb Cr
o me ctio blo om • pattern:
c a s sT
e s e s f er e
lik lik pre lik
rating viewer/movie factors
viewer
• learning:
Match movie and add contributions
from each factor
predicted
rating
known rating
viewer factors
! learned factors
movie
! unknown rating prediction
To
co
ac d y c
blo n co tent
m
me
tio on
ck nte
Cr
bu nt
uis
ste
ei
r?
n
it?
Questions?
Components of Learning:
Metaphor Using Credit Approval
Applicant Information
age 23 years
gender female
annual salary NTD 1,000,000
year in residence 1 year
year in job 0.5 year
current debt 200,000
{(xn , yn )} from f ML g
• target f unknown
(i.e. no programmable definition)
• hypothesis g hopefully ⇡ f
but possibly different from f
(perfection ‘impossible’ when f unknown)
hypothesis set
H
hypothesis set
H
machine learning:
use data to compute hypothesis g
that approximates target f
Questions?
Questions?
hypothesis set
H
d 4
= annual salary NTD 1,000,000 1 *30
weight
• For x = (x1 , x2 , · · · , xd ) ‘features of customer’, compute a
weighted ‘score’ and
Xd
approve credit if wi xi > threshold 〈
i=1
Xd
deny credit if wi xi < threshold
i=1
w 也 是机 器 系的
• Y: +1(good), 1(bad) , 0 ignored—linear formula h 2 H are
d
! !
X
h(x) = sign w i xi threshold
i=1 可以 更動 的 地⽅
d
! !
X
h(x) = sign wi xi threshold
i=1 做 比較 ⽤ ?
0 1
!
州
州 d
X
B C
圳 = sign @ wi xi + ( threshold) · (+1)A
| {z } | {z }
: i=1
!
w0 x0
d
X
Wd
= sign wi xi
- threshold
利 wo
⇣
= sign wT x
⌘
i=0
taller
WD ,
以 ⼆
[ ] 11
• each ‘tall’ w represents a hypothesis h & is multiplied with
‘tall’ x —will use tall versions to simplify notation
Perceptrons in R2 不合 和 器 ⼀些 資料 ,
o o
o
0 0
。
x
y
✗ ✗
✗
Questions?
Select g from H
H = all possible perceptrons, g =? linesink
ˋ
,
wo
☒
scaler
That’s it!
—A fault confessed is half redressed. :-) 知錯 能 改 :-)
Cyclic PLA
For t = 0, 1, . . .
1 find the next mistake of wt called xn(t) , yn(t)
⇣ ⌘
sign wTt xn(t) 6= yn(t)
Seeing is Believing
initially
Seeing is Believing
update: 1
w(t+1)
x1
update: 2
-
-_- correction
ii
-
-_-
w(t)
bugo w(t+1)
x9
ii
update: 3
y
w(t+1)
w(t)
x14
NOT perfct ,
Seeing is Believing
直 來回 調整
update: 4
Urrecf ,
-_-
x3 _
⼀ w(t)
_ w(t+1)
Seeing is Believing
update: 5
w(t)
w(t+1)
x9
Seeing is Believing
update: 6
w(t+1)
w(t)
x14
Seeing is Believing
update: 7
w(t)
w(t+1)
x9
Seeing is Believing
update: 8
w(t+1)
w(t)
x14
Seeing is Believing
update: 9
w(t)
w(t+1)
x9
Seeing is Believing
finally
f) ⼆ shiftingismanxvalue
rotationibigxvalue
wo →
-
wPLA
Xiifeatnresthatwehave ,
Learning: g ⇡ f ?
• on D, if halt, yes (no mistake)
• outside D: ??
• if not halting: ??
Questions?
Linear Separability
• if PLA halts (i.e. no more mistakes),
(necessary condition) D allows some w to make no mistake
• call such D linear separable
>
innerproductbeforeupdating ,
datasetproperty.gr
⇢ 。
R2
之
kw 1k
wTf w2 wTf w1 + 2
\ 2 2
⼆ 1k + R
⇢ kw
⼈ 2k 普 kw
wTf w3 wTf w2 + ⇢ \
kw 3k
2
普 \
kw 2
2k + R
2
··· ···
wTf wT T
wi.
f wT 1 + ⇢ kwT k2 導 kw \ 2
T 1k + R
2
✓ ◆2
wTf wT T⇢ R
喜 1
kwf k kwT k
p
1 TR
=) T
⇢
Hsuan-Tien Lin (NTU CSIE) Aotmistakecorrection
Machine Learning 41/43
Basics of Machine Learning Guarantee of PLA
Pros
simple to implement, fast, works in any dimension d
Cons
• ‘assumes’ linear separable D to halt
—property unknown in advance (no need for PLA if we know wf )
• not fully sure how long halting takes (⇢ depends on wf )
—though practically fast