Machine Learning Course Administrivia

6.
891 Ma hine learning and neural networks Course administrivia
Instru tor: Prof. Tommi Jaakkola Le tures TR 2.30-4pm (in this room)
tommiai.mit.edu Re itations WF 1-2pm (in this room)
NE43-735 Midterm (15%) and a nal (30%)
TA: Jason Rennie
6 bi-weekly problem sets (30%)
jrennieai.mit.edu nal proje t (25%)
NE43-733 The ourse satises the AI requirement for Area II graduate
students
Additional information in the handout...
Topi s Learning
Classi ation, regression Some terminology: learning vs. adaptation

Neural networks, ba k-propagation { Adaptation is temporary and \resets" itself
Generalization, regularization e.g., brightness adaption of eyes
Feature sele tion, feature indu tion
{ Learning involves a persistent hange
Combining multiple models, additive models, boosting
Support ve tor ma hines, kernels
Complexity, VC-dimension
Model sele tion, stru tural risk minimization
Density estimation, parametri , non-parametri
Clustering, agglomerative, mixture models
Hidden markov models, forward-ba kward algorithm, EM
Bayesian networks, graphs, estimation, model sele tion
Markov random elds, dynami Bayesian networks
Time series predi tion, independent omponent analysis
Learning ont'd Representation and assumptions
No learning takes pla e without assumptions! Example: image labeling

Assumptions/biases ome in many avors
{ representation: how information is stored in the observed
examples su h as images
{ qualitative relations: independen e
et .
A hange in the representation that nevertheless preserves the
relevant information an pre lude learning
Ma hine learning Ma hine learning problems
Ma hine learning is everywhere... Types of learning problems:

spee h re ognition, medi al diagnosis, image/text retrieval, 1. Supervised learning (with a tea her)
ommer ial software, ... 2. Unsupervised learning (no tea her, obje tive)
It is not just dedu tion but indu tion of rules and properties 3. Reinfor ement learning (re eive a ost asso iated with an
from examples to fa ilitate predi tion and de ision making a tion)
Resear h in this area tries to understand why, how, and under This is not a omplete nor \pre ise" list
what onditions learning takes pla e { e.g., a tive learning
Strong ties to statisti al inferen e/estimation (Why?)
Boldly goes where angels fear to tread...
Supervised learning Supervised learning
Example: digit re ognition (8x8 binary digits)
binary digit target label
We are given set of input examples fx1; : : : ; xng, where
xi = (xi1 : : : xid)T ; d = dim(xi);
and orresponding outputs fy1; : : : ; yng.
\2"
Our task is to learn a mapping or fun tion f : X !Y from
inputs x 2 X to outputs y 2 Y su h that
\2" yi f(xi); i = 1; : : : ; n (1)
n is the number of training examples.
\1"
For any problem there is a simple solution that is wrong
\1"
... ...
We wish to learn the mapping from digits to labels
The input/output spa es may be dis rete or ontinuous Given a set of training examples f(x1; y1) : : : ; (xn; yn)g, we want
to learn a mapping f : X ! Y su h that
Classi ation (qualitative) yi f(xi); i = 1; : : : ; n (2)
Continous/dis rete inputs, dis rete outputs
{ (binary pixel values) ! digit label We need to parameterize the set of fun tions (Why?)
For example: f(x) = ax + b, where we learn the parameters a
{ (gray level pixel values) ! digit label and b.
Regression (quantitative) 3
Continous/dis rete inputs, ontinuous outputs 1
{ (interest rate) ! sto k pri e

0
−1
{ (sto k(t 1); sto k(t)) ! sto k(t + 1) −2
−3
−4
−5
−2 −1 0 1 2
BUT: how did we \t" the fun tion to the training examples?
We dene a loss fun tion Loss(y; f(x)) that measures how Of ourse, the more parameters we have in the fun tion, the
mu h our predi tions deviate from the given answers smaller training error that we an a hieve
yi f(xi); Loss(yi; f(xi)); i = 1; : : : ; n (3)
3 4
2
2
For example: Loss(yi; f(xi)) = (yi f(xi))2.

1

0
To \t" the parameters su h as the linear oeÆ ients a and b −1
in f(x) = ax + b, we an minimize the training error/loss:

−2 −2
Xn Loss(yi; f(xi))
−3
−4
−4
(4) −5
−2 −1 0 1 2
−6
−2 −1 0 1 2
i=1 f = linear f = 4th order polynomial

Overtting
Generalization
Model sele tion
Unsupervised learning Clustering
The digits again... Pairwise lustering: for any two examples xi and xj we need a
similarity or dissimilarity measure
Dij = D(xi; xj ); i; j = 1; : : : ; n (5)
There are several problems here:
1. How to hoose the dissimilarity measure?
2. How to exploit the pairwise measure to group examples
Clustering: group together \similar" digits together?
Density estimation: model the population of digits 3. How to assess the quality of the resulting grouping?
Density estimation Reinfor ement learning
We wish to onstru t a generative probability model of the There's no tea her but we re eive (possibly noisy) feed-ba k
examples in a given population about how good our a tions are
For example: Take a tion a ! re eive Cost(a; !)
Often we do not know the ost fun tion Cost(a; !) but only
+ = observe its values after taking a tions.
A losely related problem is how to best ompress a given set

The goal is to sele t the a tion that minimizes the expe ted
ost
of examples.
E! f Cost(a; !) g
X
1 n
Cost(a; !i) (6)
n i=1
This be omes mu h more interesting when our a tions a tually
ae t the state of the environment... games, planning, et .
Summary
Some of the key on epts we have dis ussed:

Assumptions & representation in learning
The role of probability in ma hine learning
Overtting, generalization, model sele tion
Pairwise lustering, generative density estimation
Minimizing expe ted ost

Machine Learning Course Administrivia

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning Course Administrivia

Uploaded by

Copyright:

Available Formats

6.

891 Ma hine learning and neural networks Course administrivia

Classi ation, regression Some terminology: learning vs. adaptation

No learning takes pla e without assumptions! Example: image labeling

Ma hine learning Ma hine learning problems

Ma hine learning is everywhere... Types of learning problems:

Supervised learning Supervised learning

Continous/dis rete inputs, ontinuous outputs 1

{ (interest rate) ! sto k pri e

{ (sto k(t 1); sto k(t)) ! sto k(t + 1) −2

For example: Loss(yi; f(xi)) = (yi f(xi))2.

To \t" the parameters su h as the linear oeÆ ients a and b −1

in f(x) = ax + b, we an minimize the training error/loss:

i=1 f = linear f = 4th order polynomial

Unsupervised learning Clustering

A losely related problem is how to best ompress a given set

Some of the key on epts we have dis ussed:

You might also like

Machine Learning Course Administrivia

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning Course Administrivia

Uploaded by

Copyright:

Available Formats

6.

891 Ma hine learning and neural networks Course administrivia

Classi ation, regression  Some terminology: learning vs. adaptation

 No learning takes pla e without assumptions!  Example: image labeling

Ma hine learning Ma hine learning problems

 Ma hine learning is everywhere...  Types of learning problems:

Supervised learning Supervised learning

Continous/dis rete inputs, ontinuous outputs 1

{ (interest rate) ! sto k pri e

{ (sto k(t 1); sto k(t)) ! sto k(t + 1) −2

For example: Loss(yi; f(xi)) = (yi f(xi))2.

To \ t" the parameters su h as the linear oeÆ ients a and b −1

in f(x) = ax + b, we an minimize the training error/loss:

i=1 f = linear f = 4th order polynomial

Unsupervised learning Clustering

 A losely related problem is how to best ompress a given set

Some of the key on epts we have dis ussed:

You might also like

Classi ation, regression Some terminology: learning vs. adaptation

No learning takes pla e without assumptions! Example: image labeling

Ma hine learning is everywhere... Types of learning problems:

To \t" the parameters su h as the linear oeÆ ients a and b −1

A losely related problem is how to best ompress a given set