You are on page 1of 42

Machine Learning

(_hx“)
Lecture 2: The Learning Problems
Hsuan-Tien Lin (ó“0)
htlin@csie.ntu.edu.tw

Department of Computer Science


& Information Engineering
National Taiwan University
(↵Àc'x«⌦Â↵˚)

Hsuan-Tien Lin (NTU CSIE) Machine Learning 0/39


The Learning Problems

探討 不同 的 ML 問題 Roadmap
1 When Can Machines Learn?

Lecture 2: The Learning Problems


Learning with Different Output Space Y
Learning with Different Data Label yn
Learning with Different Protocol f ) (xn , yn )
Learning with Different Input Space X

Hsuan-Tien Lin (NTU CSIE) Machine Learning 1/39


The Learning Problems Learning with Different Output Space Y

Credit Approval Problem Revisited


age 23 years
gender female
annual salary NTD 1,000,000
year in residence 1 year
unknown target function year in job 0.5 year
f: X !Y current debt 200,000
(ideal credit approval formula) credit? {no( 1), yes(+1)}

training examples learning final hypothesis


D : (x1 , y1 ), · · · , (xN , yN ) algorithm g⇡f
A
(historical records in bank) (‘learned’ formula to be used)

hypothesis set
H

(set of candidate formula)

Y = { 1, +1}: binary classification

Hsuan-Tien Lin (NTU CSIE) Machine Learning 2/39


The Learning Problems Learning with Different Output Space Y

分類 .ie, ⼼ More Binary Classification Problems 單 選題 1


是非 題

• credit approve/disapprove
• email spam/non-spam
• patient sick/not sick
• ad profitable/not profitable
• answer correct/incorrect (KDDCup 2010)

core and important problem with


many tools as building block of other tools

Hsuan-Tien Lin (NTU CSIE) Machine Learning 3/39


The Learning Problems Learning with Different Output Space Y

Multiclass Classification: Coin Recognition Problem


辨認 不同 硬幣 ,

• classify US coins (1c, 5c, 10c, 25c)


25 by (size, mass) 4 种 美元
Mass

5
• Y = {1c, 5c, 10c, 25c}, or
1 Y = {1, 2, · · · , K } (abstractly) 多 選題
• binary classification: special case
10
with K = 2
Size

Other Multiclass Classification Problems


• written digits ) 0, 1, · · · , 9
• emails ) spam, primary, social, promotion, update (Google)

many applications in practice

Hsuan-Tien Lin (NTU CSIE) Machine Learning 4/39


The Learning Problems Learning with Different Output Space Y

Multiclass Classification for Object Recognition:


Which Fruit? 辨認 是 哪 种 ⽔果 ⼀

?
(image by Robert-Owen-Wahl from Pixabay)

apple orange strawberry kiwi


(images by Pexels, PublicDomainPictures, 192635, Rob van der Meijden from Pixabay)

Y = {apple, orange, strawberry, kiwi}

Hsuan-Tien Lin (NTU CSIE) Machine Learning 5/39


The Learning Problems Learning with Different Output Space Y

Which Fruits?

?: {apple, orange, kiwi}


(image by Michal Jarmoluk from Pixabay)

apple orange strawberry kiwi


(images by Pexels, PublicDomainPictures, 192635, Rob van der Meijden from Pixabay)

multilabel classification: 複選 題
classify input to multiple (or no) categories
Y = 2{apple,orange,strawberry,kiwi}
powerset ,

Hsuan-Tien Lin (NTU CSIE) Machine Learning 6/39


The Learning Problems Learning with Different Output Space Y

What Tags?

?: {machine learning, data structure, data mining, object


oriented programming, artificial intelligence, compiler,
architecture, chemistry, textbook, children book, . . . etc. }
another multilabel classification problem:
tagging input to multiple categories

Hsuan-Tien Lin (NTU CSIE) Machine Learning 7/39


The Learning Problems Learning with Different Output Space Y

Binary Relevance: Multilabel Classification via Yes/No


複選 也 可以 ⽤ ⼼ 1 No 來 決定
binary multilabel w/ L classes: L yes/no
classification questions
{yes, no} machine learning (Y), data structure (N), data
mining (Y), OOP (N), AI (Y), compiler (N),
architecture (N), chemistry (N), textbook (Y),
children book (N), etc.
• Binary Relevance (BR): reduction (transformation) to multiple
isolated binary classification
• disadvantages (addressed by more sophisticated models):
• isolation—hidden relations not exploited
(e.g. ML and DM highly correlated, ML subset of AI, textbook &
children book disjoint)
• imbalanced—few yes, many no

BR for multilabel classification:


uses binary classification as a core tool
Hsuan-Tien Lin (NTU CSIE) Machine Learning 8/39
The Learning Problems Learning with Different Output Space Y

Regression: Patient Recovery Prediction Problem


病⼈ 幾 天 可以 康復 ? 要 給 出 个 真實值 ⼀

• binary classification: patient features ) sick or not


• multiclass classification: patient features ) which type of cancer
• regression: patient features ) how many days before recovery
• Y = R or Y = [lower, upper] ⇢ R (bounded regression)
—deeply studied in statistics

Other Regression Problems


• company data ) stock price
• climate data ) temperature

also core and important with many ‘statistical’


tools as building block of other tools

Hsuan-Tien Lin (NTU CSIE) Machine Learning 9/39


The Learning Problems Learning with Different Output Space Y

Sophisticated Output: Image Generation Problems


Style Transfer output 很複雜

+ )
(Leonardo da Vinci, (Van Gogh, (Pjfinlay,
in Public Domain) in Public Domain) with CC0)
all images are downloaded from Wikipedia

Other Image Generation Problems


• noisy image ) clean image
• low-resolution image ) high-resolution image
t.tt
Y: a ‘manifold’ ⇢ Rw⇥h⇥c ,
arguably not just multi-pixel regression
Hsuan-Tien Lin (NTU CSIE) Machine Learning 10/39
The Learning Problems Learning with Different Output Space Y

Mini Summary
Learning with Different Output Space Y
是非 題 • binary classification: Y = { 1, +1}
選擇題 • multiclass classification: Y = {1, 2, · · · , K }
4逾
4選 … ⼼ 多 選題 • multilabel classification: Y = 2{1,2,··· ,K }
• regression: Y = R power 包含 所有 可能 的 了

unknown target function


f: X !Y
• image generation: Y ⇢ Rw⇥h⇥c
• . . . and a lot more!!

training examples learning final hypothesis


D : (x1 , y1 ), · · · , (xN , yN ) algorithm g⇡f
A

hypothesis set
H

core tools: binary classification and regression


Hsuan-Tien Lin (NTU CSIE) Machine Learning 11/39
The Learning Problems Learning with Different Output Space Y

Questions?

Hsuan-Tien Lin (NTU CSIE) Machine Learning 12/39


The Learning Problems Learning with Different Data Label yn

Supervised: Coin Recognition Revisited


25

Mass
5
unknown target function 1

f: X !Y
10

Size

training examples learning final hypothesis


D : (x1 , y1 ), · · · , (xN , yN ) algorithm g⇡f
A

hypothesis set
H

supervised learning:
every xn comes with corresponding yn

Hsuan-Tien Lin (NTU CSIE) Machine Learning 13/39


The Learning Problems Learning with Different Data Label yn

Unsupervised: Coin Recognition without yn


不但 只有 clustering ,

25
Mass

Mass
5
1

10

Size Size

supervised multiclass classification unsupervised multiclass classification


() ‘clustering’
知道 哪裡 確切 有
Other Clustering Problems
• articles ) topics
• consumer profiles ) consumer groups

clustering: a challenging but useful problem

Hsuan-Tien Lin (NTU CSIE) Machine Learning 14/39


The Learning Problems Learning with Different Data Label yn

Unsupervised: Coin Recognition without yn

25
Mass

Mass
5
1

10

Size Size

supervised multiclass classification unsupervised multiclass classification


() ‘clustering’

Other Clustering Problems


• articles ) topics
• consumer profiles ) consumer groups

clustering: a challenging but useful problem

Hsuan-Tien Lin (NTU CSIE) Machine Learning 14/39


The Learning Problems Learning with Different Data Label yn

Unsupervised: Learning without yn


Other Unsupervised Learning Problems
• clustering: {xn } ) cluster(x)
(⇡ ‘unsupervised multiclass classification’)
—i.e. articles ) topics
• density estimation: {xn } ) density(x) 比 1) ,

(⇡ ‘unsupervised bounded regression’) 縼 data


—i.e. traffic reports with location ) dangerous areas
• outlier detection: {xn } ) unusual(x)
(⇡ extreme ‘unsupervised binary classification’)
—i.e. Internet logs ) intrusion alert
• . . . and a lot more!!

unsupervised learning: diverse, with possibly


very different performance goals

Hsuan-Tien Lin (NTU CSIE) Machine Learning 15/39


The Learning Problems Learning with Different Data Label yn

Self-supervised: Unsupervised + Self-defined Goal(s)


jigsaw puzzle: pieces ! full picture
学習
在真正 task 前 ⾃ 洗学 習 像 是 step.it ep
, ,

(Figure 1 of Noroozi and Favaro,


Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles. ECCV 2016)

Other Popular Goals


• colorization: grayscale image ! colored image
• center word prediction: chunk of text ! center word
• next sentence prediction: sentence A ! is sentence B next?

self-supervised learning: recipe to learn


‘physical knowledge’ before actual task
Hsuan-Tien Lin (NTU CSIE) Machine Learning 16/39
The Learning Problems Learning with Different Data Label yn

Semi-supervised: Coin Recognition with Some yn

25 25
Mass

Mass

Mass
5 5
1 1

10 10

Size Size Size

supervised semi-supervised unsupervised (clustering)

Other Semi-supervised Learning Problems


• face images with a few labeled ) face identifier (Facebook)
• medicine data with a few labeled ) medicine effect predictor

semi-supervised learning: leverage


unlabeled data to avoid ‘expensive’ labeling

Hsuan-Tien Lin (NTU CSIE) Machine Learning 17/39


The Learning Problems Learning with Different Data Label yn

Weakly-supervised: Learning without True yn


complementary label: ȳn (‘not’ label) instead of yn

(Figure 1 of Yu et al., Learning with Biased Complementary Labels, ECCV 2018)

Other Weak Supervisions


• partial label: a set Yn that contains true yn
• noisy label: yn0 , a noisy version of true yn
• proportion label: aggregated statistics of a set of yn

weakly-supervised learning: another


realistic family to reduce labeling burden

Hsuan-Tien Lin (NTU CSIE) Machine Learning 18/39


The Learning Problems Learning with Different Data Label yn

Reinforcement Learning
a ‘very different’ but natural way of learning

Teach Your Dog: Say ‘Sit Down’


The dog pees on the ground.
BAD DOG. THAT’S A VERY WRONG ACTION.
• cannot easily show the dog that yn = sit
when xn = ‘sit down’
• but can ‘punish’ to say ỹn = pee is wrong

Other Reinforcement Learning Problems Using (x, ỹ , goodness)


• (customer, ad choice, ad click earning) ) ad system
• (cards, strategy, winning amount) ) black jack agent

reinforcement: learn with ‘partial/implicit


information’ (often sequentially)

Hsuan-Tien Lin (NTU CSIE) Machine Learning 19/39


The Learning Problems Learning with Different Data Label yn

Reinforcement Learning
a ‘very different’ but natural way of learning

Teach Your Dog: Say ‘Sit Down’


The dog sits down.
Good Dog. Let me give you some cookies.
• still cannot show yn = sit
when xn = ‘sit down’
• but can ‘reward’ to say ỹn = sit is good

Other Reinforcement Learning Problems Using (x, ỹ , goodness)


• (customer, ad choice, ad click earning) ) ad system
• (cards, strategy, winning amount) ) black jack agent

reinforcement: learn with ‘partial/implicit


information’ (often sequentially)

Hsuan-Tien Lin (NTU CSIE) Machine Learning 19/39


The Learning Problems Learning with Different Data Label yn

THE Most Well-known Reinforcement Learning Agent

(Public Domain, from Wikipedia; used here for education purpose; all other rights still belong to Google DeepMind)

Non-ML Techniques ML Techniques


Monte C. Tree Search Deep Learning Reinforcement Learn.
⇡ move simulation in ⇡ board analysis in ⇡ (self)-practice in
brain human brain human training

(CC-BY-SA 3.0 by Stannered on (CC-BY-SA 2.0 by Frej Bjon on (Public Domain, from Wikipedia)
Wikipedia) Wikipedia)

good AI: important to use the right


techniques—ML & others, including human

Hsuan-Tien Lin (NTU CSIE) Machine Learning 20/39


The Learning Problems Learning with Different Data Label yn
Mini Summary
Learning with Different Data Label yn
•• supervised: all yn
•• unsupervised: no yn
• self-supervised: self-defined yn0 from xn
• semi-supervised: some yn

• weakly-supervised: no true yn
unknown target function
f: X !Y
• reinforcement: implicit yn by goodness(ỹn )

• . . . and more!!

training examples learning final hypothesis


D : (x1 , y1 ), · · · , (xN , yN ) algorithm g⇡f
A

hypothesis set
H

core tool: supervised learning


Hsuan-Tien Lin (NTU CSIE) Machine Learning 21/39
The Learning Problems Learning with Different Data Label yn

Questions?

Hsuan-Tien Lin (NTU CSIE) Machine Learning 22/39


The Learning Problems Learning with Different Protocol f ) (xn , yn )

Batch Learning: Coin Recognition Revisited


25

Mass
5
unknown target function 1

f: X !Y
10

Size

training examples learning final hypothesis


D : (x1 , y1 ), · · · , (xN , yN ) algorithm g⇡f
A

hypothesis set
H

batch supervised multiclass classification:


learn from all known data

Hsuan-Tien Lin (NTU CSIE) Machine Learning 23/39


The Learning Problems Learning with Different Protocol f ) (xn , yn )

More Batch Learning Problems


25

Mass

Mass
5
1

10

Size Size

• batch of (email, spam?) ) spam filter


• batch of (patient, cancer) ) cancer
classifier
• batch of patient data ) group of patients

batch learning: a very common protocol

Hsuan-Tien Lin (NTU CSIE) Machine Learning 24/39


The Learning Problems Learning with Different Protocol f ) (xn , yn )

Online: Spam Filter that ‘Improves’


• batch spam filter:
learn with known (email, spam?) pairs, and predict with fixed g
• online spam filter, which sequentially:
1 observe an email xt
2 predict spam status with current gt (xt ) incrementalupdate
3 receive ‘desired label’ yt from user, and then update gt with (xt , yt )
hypotheses 键新
Connection to What We Have Learned
• PLA can be easily adapted to online protocol (how?) ˇ
• reinforcement learning is often done online (why?)
因為 feedback 会 seqnential 的 回來 .incrementalnpdate
,

online: hypothesis ‘improves’ through receiving


data instances sequentially

Hsuan-Tien Lin (NTU CSIE) Machine Learning 25/39


The Learning Problems Learning with Different Protocol f ) (xn , yn )

Online + Batch for Real-World Applications


online prediction
model re-trained with for each request
historical daily batch
data, incrementally or
completely batch labelled data
collected daily

purely online purely batch


• incremental update costly • cannot capture drifts/trends
online well
• delayed labels hard to • complete re-training
handle properly possibly costly

real-world ML system
different from textbook settings
Hsuan-Tien Lin (NTU CSIE) Machine Learning 26/39
The Learning Problems Learning with Different Protocol f ) (xn , yn )

Active Learning: Learning by ‘Asking’


Protocol , Learning Philosophy
• batch: ‘duck feeding’
• online: ‘passive sequential’
unknown target function
f: X !Y • active: ‘question asking’ (sequentially)
—query the yn of the chosen xn

training examples learning final hypothesis


D : (x1 , y1 ), · · · , (xN , yN ) algorithm g⇡f
A

hypothesis set
H

active: improve hypothesis with fewer labels


(hopefully) by asking questions strategically

Hsuan-Tien Lin (NTU CSIE) Machine Learning 27/39


The Learning Problems Learning with Different Protocol f ) (xn , yn )

Making Active Learning More Realistic

open-source tool libact developed by NTU CLLab (Yang, 2017)


https://github.com/ntucllab/libact
• including many popular strategies
• received > 500 stars and continuous issues

“libact is a Python package designed to make ac-


tive learning easier for real-world users”

Hsuan-Tien Lin (NTU CSIE) Machine Learning 28/39


The Learning Problems Learning with Different Protocol f ) (xn , yn )

Mini Summary
Learning with Different Protocol f ) (xn , yn )
• batch: all known data Offline =

• online: sequential (passive) data


• online + batch: best of both worlds
unknown target function
f: X !Y • active: strategically-observed data
• . . . and more!!

training examples learning final hypothesis


D : (x1 , y1 ), · · · , (xN , yN ) algorithm g⇡f
A

hypothesis set
H

core protocol: batch


Hsuan-Tien Lin (NTU CSIE) Machine Learning 29/39
The Learning Problems Learning with Different Protocol f ) (xn , yn )

Questions?

Hsuan-Tien Lin (NTU CSIE) Machine Learning 30/39


The Learning Problems Learning with Different Input Space X

Credit Approval Problem Revisited


age 23 years
gender female
annual salary NTD 1,000,000
unknown target function
year in residence 1 year
f: X !Y
year in job 0.5 year
(ideal credit approval formula) current debt 200,000

training examples learning final hypothesis


D : (x1 , y1 ), · · · , (xN , yN ) algorithm g⇡f
A
(historical records in bank) (‘learned’ formula to be used)

hypothesis set
H

(set of candidate formula)

concrete features: each dimension of X ✓ Rd


represents ‘sophisticated physical meaning’
Hsuan-Tien Lin (NTU CSIE) Machine Learning 31/39
The Learning Problems Learning with Different Input Space X

More on Concrete Features

• (size, mass) for coin classification


• customer info for credit approval 25

Mass
• patient info for cancer diagnosis 5
1
• often including ‘human intelligence’
on the learning task 10

Size

concrete features: the ‘easy’ ones for ML

Hsuan-Tien Lin (NTU CSIE) Machine Learning 32/39


The Learning Problems Learning with Different Input Space X

Raw Features: Digit Recognition Problem (1/2)

• digit recognition problem: features ) meaning of digit


• a typical supervised multiclass classification problem

Hsuan-Tien Lin (NTU CSIE) Machine Learning 33/39


The Learning Problems Learning with Different Input Space X
Raw Features: Digit Recognition Problem (2/2)
by Concrete Features by Raw Features

wronglabeimy.e
• 16 by 16 gray image x ⌘
(0, 0, 0.9, 0.6, · · · ) 2 R256
• ‘simple physical meaning’;
thus more difficult for ML
than concrete features

x =(symmetry, density)
1 較好 觕 5 較 dense 、

Other Problems with Raw Features


• image pixels, speech signal, etc.

raw features: often need human (‘feature


engineering’) or machines to convert to
concrete ones
Hsuan-Tien Lin (NTU CSIE) Machine Learning 34/39
The Learning Problems Learning with Different Input Space X

Deep Learning: ‘Automatic’ Conversion


from Raw to Concrete
positive weight
6
ahstractaatai 經 層層 転 鏩 negative weight 每 層 上去 都 更,

concrete ⼀些
变 成 有意✗ data is it a ‘1’ ? - z1 z5 is it a ‘5’ ?

patches
lmoreconcrete)
2 3 4 5
1 6
>

onekindoffeatnre
raw
, eytractfromimagei
• layered extraction: simple to complex features
• natural for difficult learning task with raw features, like vision

deep learning: currently popular in


vision/speech/. . .

Hsuan-Tien Lin (NTU CSIE) Machine Learning 35/39


The Learning Problems Learning with Different Input Space X

Abstract Features: Rating Prediction Problem


Rating Prediction Problem (KDDCup 2011)
• given previous (userid, itemid, rating) tuples, predict the rating that
some userid would give to itemid?
• a regression problem with Y ✓ R as rating and X ✓ N ⇥ N as
(userid, itemid)
• ‘no physical meaning’; thus even more difficult for ML

Other Problems with Abstract Features


• student ID in online tutoring system (KDDCup 2010)
• advertisement ID in online ad system

abstract: again need ‘feature


conversion/extraction/construction’

Hsuan-Tien Lin (NTU CSIE) Machine Learning 36/39


The Learning Problems Learning with Different Input Space X

Mini Summary
Learning with Different Input Space X
• concrete: sophisticated (and related)
physical meaning
• raw: simple physical meaning
unknown target function
f: X !Y • abstract: no (or little) physical meaning
• . . . and more!!

training examples learning final hypothesis


D : (x1 , y1 ), · · · , (xN , yN ) algorithm g⇡f
A

hypothesis set
H

‘easy’ input: concrete


Hsuan-Tien Lin (NTU CSIE) Machine Learning 37/39
The Learning Problems Learning with Different Input Space X

Questions?

Hsuan-Tien Lin (NTU CSIE) Machine Learning 38/39


The Learning Problems Learning with Different Input Space X

Summary
1 When Can Machines Learn?

Lecture 1: Basics of Machine Learning


Lecture 2: The Learning Problems
Learning with Different Output Space Y
[classification], [regression], others
Learning with Different Data Label yn
[supervised], un/semi/weakly-sup., reinforcement
Learning with Different Protocol f ) (xn , yn )
[batch], online, active
Learning with Different Input Space X
[concrete], raw, abstract
• next: learning is impossible?!

Hsuan-Tien Lin (NTU CSIE) Machine Learning 39/39

You might also like