You are on page 1of 5

6.

891 Ma hine learning and neural networks Course administrivia

Instru tor: Prof. Tommi Jaakkola  Le tures TR 2.30-4pm (in this room)
tommiai.mit.edu  Re itations WF 1-2pm (in this room)
NE43-735  Midterm (15%) and a nal (30%)
TA: Jason Rennie
 6 bi-weekly problem sets (30%)
jrennieai.mit.edu  nal proje t (25%)
NE43-733  The ourse satis es the AI requirement for Area II graduate
students
Additional information in the handout...

Topi s Learning

Classi ation, regression  Some terminology: learning vs. adaptation


Neural networks, ba k-propagation { Adaptation is temporary and \resets" itself
Generalization, regularization  e.g., brightness adaption of eyes
Feature sele tion, feature indu tion
{ Learning involves a persistent hange
Combining multiple models, additive models, boosting
Support ve tor ma hines, kernels
Complexity, VC-dimension
Model sele tion, stru tural risk minimization
Density estimation, parametri , non-parametri
Clustering, agglomerative, mixture models
Hidden markov models, forward-ba kward algorithm, EM
Bayesian networks, graphs, estimation, model sele tion
Markov random elds, dynami Bayesian networks
Time series predi tion, independent omponent analysis
Learning ont'd Representation and assumptions

 No learning takes pla e without assumptions!  Example: image labeling


 Assumptions/biases ome in many avors
{ representation: how information is stored in the observed
examples su h as images
{ qualitative relations: independen e
et .
 A hange in the representation that nevertheless preserves the
relevant information an pre lude learning

Ma hine learning Ma hine learning problems

 Ma hine learning is everywhere...  Types of learning problems:


spee h re ognition, medi al diagnosis, image/text retrieval, 1. Supervised learning (with a tea her)
ommer ial software, ... 2. Unsupervised learning (no tea her, obje tive)
 It is not just dedu tion but indu tion of rules and properties 3. Reinfor ement learning (re eive a ost asso iated with an
from examples to fa ilitate predi tion and de ision making a tion)
 Resear h in this area tries to understand why, how, and under  This is not a omplete nor \pre ise" list
what onditions learning takes pla e { e.g., a tive learning
 Strong ties to statisti al inferen e/estimation (Why?)
Boldly goes where angels fear to tread...
Supervised learning Supervised learning
Example: digit re ognition (8x8 binary digits)
binary digit target label
 We are given set of input examples fx1; : : : ; xng, where
xi = (xi1 : : : xid)T ; d = dim(xi);
and orresponding outputs fy1; : : : ; yng.
\2"
 Our task is to learn a mapping or fun tion f : X !Y from
inputs x 2 X to outputs y 2 Y su h that
\2" yi  f(xi); i = 1; : : : ; n (1)
n is the number of training examples.
\1"
For any problem there is a simple solution that is wrong
\1"
... ...
 We wish to learn the mapping from digits to labels

Supervised learning Supervised learning

The input/output spa es may be dis rete or ontinuous  Given a set of training examples f(x1; y1) : : : ; (xn; yn)g, we want
to learn a mapping f : X ! Y su h that
 Classi ation (qualitative) yi  f(xi); i = 1; : : : ; n (2)
Continous/dis rete inputs, dis rete outputs
{ (binary pixel values) ! digit label  We need to parameterize the set of fun tions (Why?)
For example: f(x) = ax + b, where we learn the parameters a
{ (gray level pixel values) ! digit label and b.
 Regression (quantitative) 3

Continous/dis rete inputs, ontinuous outputs 1

{ (interest rate) ! sto k pri e


0

−1

{ (sto k(t 1); sto k(t)) ! sto k(t + 1) −2

−3

−4

−5
−2 −1 0 1 2

 BUT: how did we \ t" the fun tion to the training examples?
Supervised learning Supervised learning

 We de ne a loss fun tion Loss(y; f(x)) that measures how  Of ourse, the more parameters we have in the fun tion, the
mu h our predi tions deviate from the given answers smaller training error that we an a hieve
yi  f(xi); Loss(yi; f(xi)); i = 1; : : : ; n (3)
3 4

2
2

For example: Loss(yi; f(xi)) = (yi f(xi))2.


1


0

To \ t" the parameters su h as the linear oeÆ ients a and b −1

in f(x) = ax + b, we an minimize the training error/loss:


−2 −2

Xn Loss(yi; f(xi))
−3
−4
−4

(4) −5
−2 −1 0 1 2
−6
−2 −1 0 1 2

i=1 f = linear f = 4th order polynomial


 Over tting
 Generalization
 Model sele tion

Unsupervised learning Clustering

The digits again...  Pairwise lustering: for any two examples xi and xj we need a
similarity or dissimilarity measure
Dij = D(xi; xj ); i; j = 1; : : : ; n (5)
 There are several problems here:
1. How to hoose the dissimilarity measure?
2. How to exploit the pairwise measure to group examples
 Clustering: group together \similar" digits together?
 Density estimation: model the population of digits 3. How to assess the quality of the resulting grouping?
Density estimation Reinfor ement learning

 We wish to onstru t a generative probability model of the  There's no tea her but we re eive (possibly noisy) feed-ba k
examples in a given population about how good our a tions are
For example: Take a tion a ! re eive Cost(a; !)
Often we do not know the ost fun tion Cost(a; !) but only
+ = observe its values after taking a tions.

 A losely related problem is how to best ompress a given set


 The goal is to sele t the a tion that minimizes the expe ted
ost
of examples.
E! f Cost(a; !) g 
X
1 n
Cost(a; !i) (6)
n i=1
 This be omes mu h more interesting when our a tions a tually
a e t the state of the environment... games, planning, et .

Summary

Some of the key on epts we have dis ussed:


 Assumptions & representation in learning
 The role of probability in ma hine learning
 Over tting, generalization, model sele tion
 Pairwise lustering, generative density estimation
 Minimizing expe ted ost

You might also like