Machine Learning ( - HX")

Machine Learning
(_hx“)
Lecture 1: Basics of Machine Learning
Hsuan-Tien Lin (ó“0)
htlin@csie.ntu.edu.tw
Department of Computer Science

& Information Engineering
National Taiwan University
(↵Àc'x«⌦Â↵˚)
Hsuan-Tien Lin (NTU CSIE) Machine Learning 0/43

Basics of Machine Learning
Roadmap
1 When Can Machines Learn?

What is Machine Learning
Applications of Machine Learning
Components of Machine Learning
Machine Learning and Other Fields
Perceptron Hypothesis Set
Perceptron Learning Algorithm (PLA)
Guarantee of PLA

Basics of Machine Learning What is Machine Learning
From Learning to Machine Learning

learning: acquiring skill
with experience accumulated from observations
observations learning skill
machine learning: acquiring skill

with experience accumulated/computed from data
data ML skill
What is skill?

A More Concrete Definition

skill
, improve some performance measure (e.g. prediction accuracy)
machine learning: improving some performance measure

with experience computed from data
improved
data ML performance
measure
An Application in Computational Finance

stock data ML more investment gain
Why use machine learning?

Yet Another Application: Tree Recognition
• ‘define’ trees and hand-program: difficult

• learn from data (observations) and
recognize: a 3-year-old can do so
• ‘ML-based tree recognition system’ can be
easier to build than hand-programmed
system
ML: an alternative route to

build complicated systems
The Machine Learning Route

ML: an alternative route to build complicated systems
Some Use Scenarios

• when human cannot program the system manually
—navigating on Mars
• when human cannot ‘define the solution’ easily
—speech/visual recognition
• when needing rapid decisions that humans cannot do
—high-frequency trading
• when needing to be user-oriented in a massive scale
—consumer-targeted marketing
Give a computer a fish, you feed it for a day;

teach it how to fish, you feed it for a lifetime. :-)

Key Essence of Machine Learning

machine learning: improving some performance measure
with experience computed from data
improved
data ML performance
measure
1 exists some ‘underlying pattern’ to be learned

—so ‘performance measure’ can be improved
2 but no programmable (easy) definition
—so ‘ML’ is needed
3 somehow there is data about the pattern
—so ML has some ‘inputs’ to learn from
key essence: help decide whether to use ML

Questions?

Basics of Machine Learning Applications of Machine Learning
ML Applications: Medicine
data ML skill
By DataBase Center for Life Science;

licensed under CC BY 4.0 via Wikimedia Commons
for computer-assisted diagnosis

• data:
• patient status
• past diagnosis from doctors
• skill: dialogue system that efficiently identifies disease of
patient
my student’s earlier work

as intern @ HTC DeepQ
ML-based Applications: Communication

data ML skill
By JulianVilla26;
licensed under CC BY-SA 4.0 via Wikimedia Commons
for 4G LTE communication

• data:
• channel information (the channel matrix representing mutual
information)
• configuration (precoding, modulation, etc.) that reaches the
highest throughput
• skill: predict best configuration to the base station in a new
environment
my student’s earlier work as intern @ MTK

ML-based Applications: Manufacturing

data ML skill
By Raimond Spekking;
licensed under CC BY-SA 4.0 via Wikimedia Commons
for PCB fault detection

• data: PCB images of normal and abnormal PCBs
& maybe human-marked faulty locations
• skill: predict which PCBs are faulty
ongoing research for smart factory

ML-based Applications: Security

data ML skill
original picture by F.U.S.I.A. assistant and derivative work by Sylenius via Wikimedia Commons
face recognition
• data: faces and non-faces
• skill: predict which boxes contain faces
mature ML technique, but often need tuning

for different needs

ML-based skill Applications: Education

data ML skill
• data: students’ records on quizzes on a Math tutoring system

• skill: predict whether a student can give a correct answer to
another quiz question
A Possible ML Solution
answer correctly ⇡ Jrecent strength of student > difficulty of questionK
• give ML 9 million records from 3000 students
• ML determines (reverse-engineers) strength and difficulty
automatically
key part of the world-champion system from

National Taiwan Univ. in KDDCup 2010

ML-based skill Applications: Recommender System

(1/2)
data ML skill
• data: how many users have rated some movies

• skill: predict how a user would rate an unrated movie
A Hot Problem
• competition held by Netflix in 2006
• 100,480,507 ratings that 480,189 users gave to 17,770 movies
• 10% improvement = 1 million dollar prize
• similar competition (movies ! songs) held by Yahoo! in KDDCup
2011
• 252,800,275 ratings that 1,000,990 users gave to 624,961 songs
How can machines learn our preferences?

ML-based skill Applications: Recommender System

(2/2)
? us
ter
s ?
u ise
? A Possible ML Solution
dy n? ckb Cr
o me ctio blo om • pattern:
c a s sT
e s e s f er e
lik lik pre lik
rating viewer/movie factors
viewer
• learning:
Match movie and add contributions
from each factor
predicted
rating
known rating
viewer factors
! learned factors
movie
! unknown rating prediction
To
co
ac d y c
blo n co tent
m
me
tio on
ck nte
Cr
bu nt
uis
ste
ei
r?
n
it?
key part of the world-champion (again!)

system from National Taiwan Univ.
in KDDCup 2011

Questions?

Basics of Machine Learning Components of Machine Learning
Components of Learning:
Metaphor Using Credit Approval
Applicant Information
age 23 years
gender female
annual salary NTD 1,000,000
year in residence 1 year
year in job 0.5 year
current debt 200,000
unknown pattern to be learned:

‘approve credit card good for bank?’

Formalize the Learning Problem

Basic Notations
• input: x 2 X (customer application)
• output: y 2 Y (good/bad after approving credit card)
• unknown pattern to be learned , target function:
f : X ! Y (ideal credit approval formula)
• data , training examples: D = {(x1 , y1 ), (x2 , y2 ), · · · , (xN , yN )}
(historical records in bank)
• hypothesis , skill with hopefully good performance:
g : X ! Y (‘learned’ formula to be used)
{(xn , yn )} from f ML g

Learning Flow for Credit Approval

unknown target function
f: X !Y
(ideal credit approval formula)
training examples learning final hypothesis

D : (x1 , y1 ), · · · , (xN , yN ) algorithm g⇡f
A
(historical records in bank) (‘learned’ formula to be used)
• target f unknown
(i.e. no programmable definition)
• hypothesis g hopefully ⇡ f
but possibly different from f
(perfection ‘impossible’ when f unknown)
What does g look like?

The Learning Model
A
hypothesis set
H
(set of candidate formula)
• assume g 2 H = {hk }, i.e. approving if

• h1 : annual salary > NTD 800,000
• h2 : debt > NTD 100,000 (really?)
• h3 : year in job  2 (really?)
• hypothesis set H:
• can contain good or bad hypotheses
• up to A to pick the ‘best’ one as g
learning model = A and H

Practical Definition of Machine Learning

f: X !Y
(ideal credit approval formula)

A
hypothesis set
H
machine learning:
use data to compute hypothesis g
that approximates target f

Questions?

Basics of Machine Learning Machine Learning and Other Fields
Machine Learning and Data Mining

Machine Learning Data Mining
use data to compute hypothesis g use (huge) data to find property
that approximates target f that is interesting
• if ‘interesting property’ same as ‘hypothesis that approximate

target’
—ML = DM (usually what KDDCup does)
• if ‘interesting property’ related to ‘hypothesis that approximate
target’
—DM can help ML, and vice versa (often, but not always)
• traditional DM also focuses on efficient computation in large
database
difficult to distinguish ML and DM in reality

Machine Learning and Statistics

Machine Learning Statistics
use data to compute hypothesis g use data to make inference
that approximates target f about an unknown process
• g is an inference outcome; f is something unknown

—statistics can be used to achieve ML
• traditional statistics also focus on provable results with math
assumptions, and care less about computation
statistics: many useful tools for ML

Machine Learning and Artificial Intelligence

Machine Learning Artificial Intelligence
use data to compute hypothesis g compute something
that approximates target f that shows intelligent behavior
• g ⇡ f is something that shows intelligent behavior

—ML can realize AI, among other routes
• e.g. chess playing
• traditional AI: game tree
• ML for AI: ‘learning from board data’
ML is one possible route to realize AI

Machine Learning Connects (Big) Data and AI

skill ⇡ artificial intelligence
(big) data ML artificial intelligence
ingredient tools/steps dish
Photos Licensed under CC BY 2.0 from Andrea Goh on Flickr
ML not the only tools, but

a popular family of tools

Questions?

Basics of Machine Learning Perceptron Hypothesis Set
Credit Approval Problem Revisited

Applicant Information
age 23 years
gender female
annual salary NTD 1,000,000
year in residence 1 year
f: X !Y
year in job 0.5 year
(ideal credit approval formula) current debt 200,000

A
hypothesis set
H
what hypothesis set can we use?

A Simple Hypothesis Set: the ‘Perceptron’

age 23 years 23 * 0 . 8
d 4
= annual salary NTD 1,000,000 1 *30
year in job 0.5 year 0

. 5 * 11 2 6
current debt 200,000 2 0 0 * O.0

1
weight
• For x = (x1 , x2 , · · · , xd ) ‘features of customer’, compute a
weighted ‘score’ and
Xd
approve credit if wi xi > threshold 〈
i=1
Xd
deny credit if wi xi < threshold
i=1
w 也是机器系的
• Y: +1(good), 1(bad) , 0 ignored—linear formula h 2 H are
d
! !
X
h(x) = sign w i xi threshold
i=1 可以更動的地⽅
called ‘perceptron’ hypothesis historically

Vector Form of Perceptron Hypothesis
d
! !
X
h(x) = sign wi xi threshold
i=1 做比較⽤ ?
0 1
!
州
州 d
X
B C
圳 = sign @ wi xi + ( threshold) · (+1)A
| {z } | {z }
: i=1
!
w0 x0
d
X
Wd
= sign wi xi
- threshold
利 wo
⇣
= sign wT x
⌘
i=0
taller
WD ,
以⼆
[ ] 11
• each ‘tall’ w represents a hypothesis h & is multiplied with
‘tall’ x —will use tall versions to simplify notation
what do perceptrons h ‘look like’?

Perceptrons in R2 不合和器⼀些資料 ,
h(x) = sign (w0 + w1 x1 + w2 x2 ) 要找出這條好的

截距斜率
lineig.lineiclassifierx.im
Wotmxt Wz Xz
⼆ 0 .
o o
o
0 0
。
x
y
✗ ✗
✗
• customer features x: points on the plane (or points in Rd )

• labels y : (+1), ⇥ (-1)
• hypothesis h: lines (or hyperplanes in Rd )
—positive on one side of a line, negative on the other side
• different line classifies customers differently
perceptrons , linear (binary) classifiers

Questions?

Basics of Machine Learning Perceptron Learning Algorithm (PLA)
Select g from H
H = all possible perceptrons, g =? linesink
• want: g ⇡ f (hard when f unknown)

• almost necessary: g ⇡ f on D, ideally ˋ
g(xn ) = f (xn ) = yn 結果 label
ˋ
,
• difficult: H is of infinite size

• idea: start from some g0 , and ‘correct’ its
mistakes on D
、
Line 不好旋転 line
, , 使靠近不対
的地⽅
n
will represent g0 by its weight vector w0

全部 vector iy V02
wo
☒
scaler

找 bug 旋転 line Perceptron Learning Algorithm

,
start from some w0 (say, 0), and ‘correct’ itsy=

mistakes
+1 on D w+y x
For t = 0, 1, . . . bug y= +1 w+y x

w
1 find a mistake of wt called xn(t) , yn(t)
x
⇣ ⌘
sign wTt xn(t) 6= yn(t)
w i
y= −1 w
2 (try to) correct the mistake by Rotationi
←
y=
w+−1
yx w x
wt+1 wt + yn(t) xn(t)
- x
. . . until no more mistakes w+y x
return last w (called wPLA ) as g
That’s it!
—A fault confessed is half redressed. :-) 知錯能改 :-)

Practical Implementation of PLA

start from some w0 (say, 0), and ‘correct’ its mistakes on D
Cyclic PLA
For t = 0, 1, . . .
1 find the next mistake of wt called xn(t) , yn(t)
⇣ ⌘
sign wTt xn(t) 6= yn(t)
2 correct the mistake by
wt+1 wt + yn(t) xn(t)
. . . until a full cycle of not encountering mistakes
next can follow naïve cycle (1, · · · , N)

or precomputed random cycle

Seeing is Believing
initially
worked like a charm with < 20 lines!!

(note: made xi x0 = 1 for visual purpose)
Seeing is Believing
update: 1
w(t+1)
x1

NOT perfeū Seeing is Believing
update: 2
-
-_- correction
ii
-
-_-
w(t)
bugo w(t+1)
x9

NOT perfecf Seeing is Believing

rotat
ii
update: 3
y
w(t+1)
w(t)
x14

NOT perfct ,
Seeing is Believing
直來回調整
update: 4
Urrecf ,
-_-
x3 _
⼀ w(t)
_ w(t+1)

Seeing is Believing
update: 5
w(t)
w(t+1)
x9

Seeing is Believing
update: 6
w(t+1)
w(t)
x14

Seeing is Believing
update: 7
w(t)
w(t+1)
x9

Seeing is Believing
update: 8
w(t+1)
w(t)
x14

Seeing is Believing
update: 9
w(t)
w(t+1)
x9

Seeing is Believing
finally
f) ⼆ shiftingismanxvalue
rotationibigxvalue
wo →
-
wPLA
Xiifeatnresthatwehave ,

Some Remaining Issues of PLA

‘correct’ mistakes on D until no mistakes
Algorithmic: halt (with no mistake)?

• naïve cyclic: ??
• random cyclic: ??
• other variant: ??
Learning: g ⇡ f ?
• on D, if halt, yes (no mistake)
• outside D: ??
• if not halting: ??
[to be shown] if (...), after ‘enough’ corrections,

any PLA variant halts

Questions?

Basics of Machine Learning Guarantee of PLA
Linear Separability
• if PLA halts (i.e. no more mistakes),
(necessary condition) D allows some w to make no mistake
• call such D linear separable
(linear separable) (not linear separable) (not linear separable)

的假設
linear 是嚴格
assume linear separable D, ⼀定可以找到條 1 的媽⼀
does PLA always halt?

PLA Fact: wt Gets More Aligned with wf

linear separable D , exists perfect wf such that yn = sign(wTf xn )
• wf perfect hence every xn correctly away from line: x y, o.0 2 Min

✗ 21 2 3
t 代表第 tgiteration ci
yn(t) wTf xn(t) min yn wTf xn > 0 ! :
7 i
n
XNYN 11.26
• wTf wt " by updating with any xn(t) , yn(t)

~ ⼀
wTf wt+1 = wTf wt + yn(t) xn(t) updateruleof PLA
innerproductatterupaating wTf wt + min yn wTf xn

n
繩更新 ix. 镦下
wTf wt + 0.
,
>
innerproductbeforeupdating ,
wt appears more aligned with wf after update

(really?)
PLA Fact: wt Does Not Grow Too Fast

wt changed only when mistake 有錯才要改 Wt 。
, sign wTt xn(t) 6= yn(t) , yn(t) wTt xn(t)  0
• mistake ‘limits’ kwt k2 growth, even when updating with ‘longest’ xn
kwt+1 k2 = kwt + yn(t) xn(t) k2

2
newlength = kwt k2 + 2yn(t) wTt xn(t) + kyn(t) xn(t) k2

oldlength EO
 kwt k2 + 0 + kyn(t) xn(t) k2
 kwt k2 + max kyn xn k2
n
Let’s put everything together!

PLA Mistake Bound
nsample 取 , Timistakecorrection ,
inner product grows fast length2 grows slow

確保⼀定有 T Iowerbound 910W 可能被 Iowerbound 2限制
T T
。
wf wt+1 wf wt + min yn wf xn garantee 2 2

kwt+1 k 善 kwt k + max kxn k
n
P 会因 Wt 定义⽽变
| {z } | n {z }
datasetproperty.gr
⇢ 。
R2
Magic Chain! Magic Chain!

owthby P
wTf w1 wTf w0 + ⇢ \ 2
導 kw0 k2 + R 2
之
kw 1k
wTf w2 wTf w1 + 2
\ 2 2
⼆ 1k + R
⇢ kw
⼈ 2k 普 kw
wTf w3 wTf w2 + ⇢ \
kw 3k
2
普 \
kw 2
2k + R
2
··· ···
wTf wT T
wi.
f wT 1 + ⇢ kwT k2 導 kw \ 2
T 1k + R
2
wfwī 2 Wiwot TP 11 Wīli E 11 Wolli TN

start from w0 = 0, after T mistake corrections,
cosangle ,
✓ ◆2
wTf wT T⇢ R
喜 1
kwf k kwT k
p
1 TR
=) T 
⇢
Hsuan-Tien Lin (NTU CSIE) Aotmistakecorrection
Machine Learning 41/43
More about PLA

Guarantee
as long as linear separable and correct by mistake
• inner product of wf and wt grows fast; length of wt grows slowly
• PLA ‘lines’ are more and more aligned with wf ) halts
Pros
simple to implement, fast, works in any dimension d
Cons
• ‘assumes’ linear separable D to halt
—property unknown in advance (no need for PLA if we know wf )
• not fully sure how long halting takes (⇢ depends on wf )
—though practically fast
but anyway, a simple and promising start

Summary
1 When Can Machines Learn?

What is Machine Learning
透过 hypothesh use data to approximate target
,
Applications of Machine Learning

almost everywhere
Components of Machine Learning
A takes D and H to get g
Machine Learning and Other Fields
related to DM, Stats and AI
Perceptron Hypothesis Set
hyperplanes/linear classifiers in Rd
Perceptron Learning Algorithm (PLA)
correct mistakes and improve iteratively
Guarantee of PLA
no mistake eventually if linear separable
• next: other machine learning problems 除了 binary 、
分類還有其他
,

Machine Learning ( - HX")

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning ( - HX")

Uploaded by

Copyright:

Available Formats

Machine Learning

Department of Computer Science

Hsuan-Tien Lin (NTU CSIE) Machine Learning 0/43

Lecture 1: Basics of Machine Learning

Hsuan-Tien Lin (NTU CSIE) Machine Learning 1/43

From Learning to Machine Learning

observations learning skill

machine learning: acquiring skill

Hsuan-Tien Lin (NTU CSIE) Machine Learning 2/43

A More Concrete Definition

machine learning: improving some performance measure

An Application in Computational Finance

Why use machine learning?

Hsuan-Tien Lin (NTU CSIE) Machine Learning 3/43

Yet Another Application: Tree Recognition

• ‘define’ trees and hand-program: difficult

ML: an alternative route to

The Machine Learning Route

Some Use Scenarios

Give a computer a fish, you feed it for a day;

Hsuan-Tien Lin (NTU CSIE) Machine Learning 5/43

Key Essence of Machine Learning

1 exists some ‘underlying pattern’ to be learned

key essence: help decide whether to use ML

Hsuan-Tien Lin (NTU CSIE) Machine Learning 7/43

By DataBase Center for Life Science;

for computer-assisted diagnosis

my student’s earlier work

ML-based Applications: Communication

for 4G LTE communication

my student’s earlier work as intern @ MTK

ML-based Applications: Manufacturing

for PCB fault detection

ongoing research for smart factory

Hsuan-Tien Lin (NTU CSIE) Machine Learning 10/43

ML-based Applications: Security

mature ML technique, but often need tuning

Hsuan-Tien Lin (NTU CSIE) Machine Learning 11/43

ML-based skill Applications: Education

• data: students’ records on quizzes on a Math tutoring system

key part of the world-champion system from

Hsuan-Tien Lin (NTU CSIE) Machine Learning 12/43

ML-based skill Applications: Recommender System

• data: how many users have rated some movies

How can machines learn our preferences?

Hsuan-Tien Lin (NTU CSIE) Machine Learning 13/43

ML-based skill Applications: Recommender System

key part of the world-champion (again!)

Hsuan-Tien Lin (NTU CSIE) Machine Learning 14/43

Hsuan-Tien Lin (NTU CSIE) Machine Learning 15/43

unknown pattern to be learned:

Hsuan-Tien Lin (NTU CSIE) Machine Learning 16/43

Formalize the Learning Problem

Hsuan-Tien Lin (NTU CSIE) Machine Learning 17/43

Learning Flow for Credit Approval

training examples learning final hypothesis

What does g look like?

(set of candidate formula)

• assume g 2 H = {hk }, i.e. approving if

learning model = A and H

Practical Definition of Machine Learning

training examples learning final hypothesis

h(x) = sign (w0 + w1 x1 + w2 x2 ) 要找出這條好的