CS189/CS289A
Introduction to Machine Learning
Lecture 1: Overview
Alexei Efros and Peter Bartlett
January 20, 2015
1 / 37
Organizational Issues
2 / 37
Organizational Issues
Instructors: Peter Bartlett and Alyosha Efros.
2 / 37
Organizational Issues
Instructors: Peter Bartlett and Alyosha Efros.
GSIs: Peter Gao, Yun Park, Faraz Tavakoli, Kevin Tee, Pat Virtue,
Christopher Xie, Daniel Xu, Yuchen Zhang.
2 / 37
Organizational Issues
Instructors: Peter Bartlett and Alyosha Efros.
GSIs: Peter Gao, Yun Park, Faraz Tavakoli, Kevin Tee, Pat Virtue,
Christopher Xie, Daniel Xu, Yuchen Zhang.
Discussion sections: You choose. If the room is full, please go to
another one. (If necessary, we may offer some specialty
sectionswatch website for announcements.)
2 / 37
Organizational Issues
Instructors: Peter Bartlett and Alyosha Efros.
GSIs: Peter Gao, Yun Park, Faraz Tavakoli, Kevin Tee, Pat Virtue,
Christopher Xie, Daniel Xu, Yuchen Zhang.
Discussion sections: You choose. If the room is full, please go to
another one. (If necessary, we may offer some specialty
sectionswatch website for announcements.)
Office hours: see web site.
2 / 37
Organizational Issues
Instructors: Peter Bartlett and Alyosha Efros.
GSIs: Peter Gao, Yun Park, Faraz Tavakoli, Kevin Tee, Pat Virtue,
Christopher Xie, Daniel Xu, Yuchen Zhang.
Discussion sections: You choose. If the room is full, please go to
another one. (If necessary, we may offer some specialty
sectionswatch website for announcements.)
Office hours: see web site.
http://www-inst.eecs.berkeley.edu/cs189
bCourses (+ piazza, kaggle), office hours, syllabus, assignments,
readings, lecture slides, announcements.
2 / 37
Organizational Issues
Assessment:
CS189
Homework 40%
Implementation and application of methods. (Kaggle)
Mathematical/reinforcement of concepts.
Seven total.
Late policy: 5 slip days total. Thats it.
Midterm 20%
(Thursday, March 19, in the lecture slot.)
Final Exam 40%
3 / 37
Organizational Issues
Assessment:
CS289A Plus a project:
Homework 40%
Midterm 20%
Final Exam 20%
Final Project 20%
(due Friday, May 1. Proposal due Friday, April 3.)
4 / 37
Organizational Issues
(Real) Prerequisites:
Math53 (vector calculus); Math54 (linear algebra); CS70 (discrete
math, probability); CS188 (more probability, decision theory).
5 / 37
Organizational Issues
(Real) Prerequisites:
Math53 (vector calculus); Math54 (linear algebra); CS70 (discrete
math, probability); CS188 (more probability, decision theory).
No screens in lectures. (To see why, google laptops in class.)
5 / 37
Organizational Issues
(Real) Prerequisites:
Math53 (vector calculus); Math54 (linear algebra); CS70 (discrete
math, probability); CS188 (more probability, decision theory).
No screens in lectures. (To see why, google laptops in class.)
Ethics:
5 / 37
Organizational Issues
(Real) Prerequisites:
Math53 (vector calculus); Math54 (linear algebra); CS70 (discrete
math, probability); CS188 (more probability, decision theory).
No screens in lectures. (To see why, google laptops in class.)
Ethics:
Discussion of homework problems with other students is encouraged.
5 / 37
Organizational Issues
(Real) Prerequisites:
Math53 (vector calculus); Math54 (linear algebra); CS70 (discrete
math, probability); CS188 (more probability, decision theory).
No screens in lectures. (To see why, google laptops in class.)
Ethics:
Discussion of homework problems with other students is encouraged.
All homeworks must be written individually (including programming
components).
5 / 37
Organizational Issues
(Real) Prerequisites:
Math53 (vector calculus); Math54 (linear algebra); CS70 (discrete
math, probability); CS188 (more probability, decision theory).
No screens in lectures. (To see why, google laptops in class.)
Ethics:
Discussion of homework problems with other students is encouraged.
All homeworks must be written individually (including programming
components).
Please read the department policy on academic dishonesty. We will be
actively checking for plagiarism.
5 / 37
Organizational Issues
(Real) Prerequisites:
Math53 (vector calculus); Math54 (linear algebra); CS70 (discrete
math, probability); CS188 (more probability, decision theory).
No screens in lectures. (To see why, google laptops in class.)
Ethics:
Discussion of homework problems with other students is encouraged.
All homeworks must be written individually (including programming
components).
Please read the department policy on academic dishonesty. We will be
actively checking for plagiarism.
Questions: Use piazza. Public and private.
5 / 37
Texts
Springer Series in Statistics
Trevor Hastie
Robert Tibshirani
Jerome Friedman
The Elements of
Statistical Learning
Data Mining, Inference, and Prediction
Second Edition
6 / 37
CS189: Introduction to Machine Learning
7 / 37
CS189: Introduction to Machine Learning
Machine Learning
Systems that learn to solve
information processing problems.
7 / 37
CS189: Introduction to Machine Learning
Machine Learning
Systems that learn to solve
information processing problems.
Learn
Use experience to improve performance:
data, queries, interaction, experiments
Statistical issues are central.
7 / 37
CS189: Introduction to Machine Learning
Machine Learning
Systems that learn to solve
information processing problems.
Learn
Use experience to improve performance:
data, queries, interaction, experiments
Statistical issues are central.
Systems
Computational issues are also central.
Algorithms, optimization.
7 / 37
An Overview of Machine Learning
1
2
3
8 / 37
An Overview of Machine Learning
Problems
2
3
8 / 37
An Overview of Machine Learning
Problems
Methods
8 / 37
An Overview of Machine Learning
Problems
Methods
Concepts
8 / 37
An Overview of Machine Learning
Problems
Methods
Concepts
8 / 37
Classification Problems (Homework)
Email
9 / 37
ESL
Classification Problems (Homework)
ESL
10 / 37
Classification
11 / 37
Classification
microsoft.com
12 / 37
Classification
apple.com
ESL
13 / 37
Classification
ISLR
14 / 37
Classification
ISLR
15 / 37
Classification
ESL
16 / 37
Regression
ESL
17 / 37
Regression
ESL
18 / 37
Regression
ESL
19 / 37
Regression
ESL
20 / 37
Regression
ESL
21 / 37
Density Estimation
ESL
22 / 37
Density Estimation
ESL
23 / 37
Dimensionality Reduction
ESL
24 / 37
Dimensionality Reduction
ESL
25 / 37
Dimensionality Reduction
ESL
26 / 37
Clustering
ESL
27 / 37
Clustering
28 / 37
Clustering
ESL
29 / 37
Clustering
ESL
30 / 37
Machine Learning Problems
Classification
31 / 37
Machine Learning Problems
Classification
Regression
31 / 37
Machine Learning Problems
Classification
Regression
Density estimation
31 / 37
Machine Learning Problems
Classification
Regression
Density estimation
Dimensionality reduction
31 / 37
Machine Learning Problems
Classification
Regression
Density estimation
Dimensionality reduction
Clustering
31 / 37
Machine Learning Problems
Classification
Regression
Density estimation
Dimensionality reduction
Clustering
Ranking
31 / 37
Machine Learning Problems
Classification
Regression
Density estimation
Dimensionality reduction
Clustering
Ranking
Collaborative filtering
31 / 37
Machine Learning Problems
Classification
Regression
Density estimation
Dimensionality reduction
Clustering
Ranking
Collaborative filtering
Sequential decision
problems:
31 / 37
Machine Learning Problems
Classification
Regression
Density estimation
Dimensionality reduction
Clustering
Ranking
Collaborative filtering
Sequential decision
problems:
bandits
31 / 37
Machine Learning Problems
Classification
Regression
Density estimation
Dimensionality reduction
Clustering
Ranking
Collaborative filtering
Sequential decision
problems:
bandits
contextual bandits
31 / 37
Machine Learning Problems
Classification
Regression
Density estimation
Dimensionality reduction
Clustering
Ranking
Collaborative filtering
Sequential decision
problems:
bandits
contextual bandits
dynamic pricing
31 / 37
Machine Learning Problems
Classification
Regression
Density estimation
Dimensionality reduction
Clustering
Ranking
Collaborative filtering
Sequential decision
problems:
bandits
contextual bandits
dynamic pricing
reinforcement learning
31 / 37
An Overview of Machine Learning
Problems
Methods
Concepts
32 / 37
Methods
Linear classifiers: Perceptron
Support vector machines
Gaussian class conditionals
Logistic regression
Naive Bayes
Linear discriminant analysis
Linear regression
Decision trees, regression trees
Ensemble methods
Neural networks
Nearest neighbor
Principal components analysis
k-means clustering
33 / 37
Methods
Linear classifiers: Perceptron
Support vector machines
Gaussian class conditionals
Logistic regression
Naive Bayes
Linear discriminant analysis
1
Classification
Regression
Linear regression
Decision trees, regression trees
Ensemble methods
Neural networks
Nearest neighbor
Principal components analysis
k-means clustering
33 / 37
Methods
Linear classifiers: Perceptron
Support vector machines
Gaussian class conditionals
Logistic regression
Naive Bayes
Linear discriminant analysis
Probabilistic
modeling.
Prediction; not based
on a model.
Linear regression
Decision trees, regression trees
Ensemble methods
Neural networks
Nearest neighbor
Principal components analysis
k-means clustering
33 / 37
An Overview of Machine Learning
Problems
Methods
Concepts
34 / 37
Concepts
1
Prediction versus probabilistic modeling.
35 / 37
Concepts
1
2
Prediction versus probabilistic modeling.
Probabilistic modeling:
35 / 37
Concepts
1
2
Prediction versus probabilistic modeling.
Probabilistic modeling:
Generative versus discriminative models.
35 / 37
Concepts
1
2
Prediction versus probabilistic modeling.
Probabilistic modeling:
Generative versus discriminative models.
Maximum likelihood estimation.
35 / 37
Concepts
1
2
Prediction versus probabilistic modeling.
Probabilistic modeling:
Generative versus discriminative models.
Maximum likelihood estimation.
Bayesian inference.
35 / 37
Concepts
1
2
Prediction versus probabilistic modeling.
Probabilistic modeling:
Generative versus discriminative models.
Maximum likelihood estimation.
Bayesian inference.
Optimization.
35 / 37
Concepts
1
2
Prediction versus probabilistic modeling.
Probabilistic modeling:
Generative versus discriminative models.
Maximum likelihood estimation.
Bayesian inference.
Optimization.
Convexity.
35 / 37
Concepts
1
2
Prediction versus probabilistic modeling.
Probabilistic modeling:
Generative versus discriminative models.
Maximum likelihood estimation.
Bayesian inference.
Optimization.
Convexity.
(Stochastic) gradient methods.
35 / 37
Concepts
1
2
Prediction versus probabilistic modeling.
Probabilistic modeling:
Generative versus discriminative models.
Maximum likelihood estimation.
Bayesian inference.
Optimization.
Convexity.
(Stochastic) gradient methods.
Newtons method.
35 / 37
Concepts
1
2
Prediction versus probabilistic modeling.
Probabilistic modeling:
Generative versus discriminative models.
Maximum likelihood estimation.
Bayesian inference.
Optimization.
Convexity.
(Stochastic) gradient methods.
Newtons method.
Controlling complexity:
35 / 37
Concepts
1
2
Prediction versus probabilistic modeling.
Probabilistic modeling:
Generative versus discriminative models.
Maximum likelihood estimation.
Bayesian inference.
Optimization.
Convexity.
(Stochastic) gradient methods.
Newtons method.
Controlling complexity:
Bias-variance/approximation-estimation trade-off.
35 / 37
Concepts
1
2
Prediction versus probabilistic modeling.
Probabilistic modeling:
Generative versus discriminative models.
Maximum likelihood estimation.
Bayesian inference.
Optimization.
Convexity.
(Stochastic) gradient methods.
Newtons method.
Controlling complexity:
Bias-variance/approximation-estimation trade-off.
Regularization
35 / 37
Concepts
1
2
Prediction versus probabilistic modeling.
Probabilistic modeling:
Generative versus discriminative models.
Maximum likelihood estimation.
Bayesian inference.
Optimization.
Convexity.
(Stochastic) gradient methods.
Newtons method.
Controlling complexity:
Bias-variance/approximation-estimation trade-off.
Regularization
Priors
35 / 37
Concepts
1
2
Prediction versus probabilistic modeling.
Probabilistic modeling:
Generative versus discriminative models.
Maximum likelihood estimation.
Bayesian inference.
Optimization.
Convexity.
(Stochastic) gradient methods.
Newtons method.
Controlling complexity:
Bias-variance/approximation-estimation trade-off.
Regularization
Priors
Practical issues:
35 / 37
Concepts
1
2
Prediction versus probabilistic modeling.
Probabilistic modeling:
Generative versus discriminative models.
Maximum likelihood estimation.
Bayesian inference.
Optimization.
Convexity.
(Stochastic) gradient methods.
Newtons method.
Controlling complexity:
Bias-variance/approximation-estimation trade-off.
Regularization
Priors
Practical issues:
Train/validate/test. Over-fitting.
35 / 37
Concepts
1
2
Prediction versus probabilistic modeling.
Probabilistic modeling:
Generative versus discriminative models.
Maximum likelihood estimation.
Bayesian inference.
Optimization.
Convexity.
(Stochastic) gradient methods.
Newtons method.
Controlling complexity:
Bias-variance/approximation-estimation trade-off.
Regularization
Priors
Practical issues:
Train/validate/test. Over-fitting.
Resampling methods.
35 / 37
Overview (Part I: Bartlett)
Linear classification
36 / 37
Overview (Part I: Bartlett)
Linear classification
Statistical learning background
36 / 37
Overview (Part I: Bartlett)
Linear classification
Statistical learning background
Decision theory
36 / 37
Overview (Part I: Bartlett)
Linear classification
Statistical learning background
Decision theory
Generative and discriminative models
36 / 37
Overview (Part I: Bartlett)
Linear classification
Statistical learning background
Decision theory
Generative and discriminative models
Controlling complexity.
36 / 37
Overview (Part I: Bartlett)
Linear classification
Statistical learning background
Decision theory
Generative and discriminative models
Controlling complexity.
Resampling, cross-validation.
36 / 37
Overview (Part I: Bartlett)
Linear classification
Statistical learning background
Decision theory
Generative and discriminative models
Controlling complexity.
Resampling, cross-validation.
The multivariate normal distribution.
36 / 37
Overview (Part I: Bartlett)
Linear classification
Statistical learning background
Decision theory
Generative and discriminative models
Controlling complexity.
Resampling, cross-validation.
The multivariate normal distribution.
Linear regression
36 / 37
Overview (Part I: Bartlett)
Linear classification
Statistical learning background
Decision theory
Generative and discriminative models
Controlling complexity.
Resampling, cross-validation.
The multivariate normal distribution.
Linear regression
Optimization
36 / 37
Overview (Part I: Bartlett)
Linear classification
Statistical learning background
Decision theory
Generative and discriminative models
Controlling complexity.
Resampling, cross-validation.
The multivariate normal distribution.
Linear regression
Optimization
Linear Classification revisited
36 / 37
Overview (Part I: Bartlett)
Linear classification
Statistical learning background
Decision theory
Generative and discriminative models
Controlling complexity.
Resampling, cross-validation.
The multivariate normal distribution.
Linear regression
Optimization
Linear Classification revisited
Logistic regression
36 / 37
Overview (Part I: Bartlett)
Linear classification
Statistical learning background
Decision theory
Generative and discriminative models
Controlling complexity.
Resampling, cross-validation.
The multivariate normal distribution.
Linear regression
Optimization
Linear Classification revisited
Logistic regression
Linear Discriminant Analysis
36 / 37
Overview (Part I: Bartlett)
Linear classification
Statistical learning background
Decision theory
Generative and discriminative models
Controlling complexity.
Resampling, cross-validation.
The multivariate normal distribution.
Linear regression
Optimization
Linear Classification revisited
Logistic regression
Linear Discriminant Analysis
Support vector machines
36 / 37
Overview (Part I: Bartlett)
Linear classification
Statistical learning background
Decision theory
Generative and discriminative models
Controlling complexity.
Resampling, cross-validation.
The multivariate normal distribution.
Linear regression
Optimization
Linear Classification revisited
Logistic regression
Linear Discriminant Analysis
Support vector machines
Statistical learning theory
36 / 37
Overview (Part II: Efros)
1
Memory-based/Instance-based learning
37 / 37
Overview (Part II: Efros)
1
Memory-based/Instance-based learning
k-nearest-neighbor
37 / 37
Overview (Part II: Efros)
1
Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
37 / 37
Overview (Part II: Efros)
1
Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
37 / 37
Overview (Part II: Efros)
1
Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
Efficient indexing and retrieval methods
37 / 37
Overview (Part II: Efros)
1
Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
Efficient indexing and retrieval methods
Decision trees
37 / 37
Overview (Part II: Efros)
1
Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
Efficient indexing and retrieval methods
Decision trees
Classification and regression trees
37 / 37
Overview (Part II: Efros)
1
Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
Efficient indexing and retrieval methods
Decision trees
Classification and regression trees
Random Forests
37 / 37
Overview (Part II: Efros)
1
Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
Efficient indexing and retrieval methods
Decision trees
Classification and regression trees
Random Forests
Boosting
37 / 37
Overview (Part II: Efros)
1
Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
Efficient indexing and retrieval methods
Decision trees
Classification and regression trees
Random Forests
3
4
Boosting
Neural networks / Deep Learning
37 / 37
Overview (Part II: Efros)
1
Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
Efficient indexing and retrieval methods
Decision trees
Classification and regression trees
Random Forests
3
4
Boosting
Neural networks / Deep Learning
Multilayer perceptrons
37 / 37
Overview (Part II: Efros)
1
Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
Efficient indexing and retrieval methods
Decision trees
Classification and regression trees
Random Forests
3
4
Boosting
Neural networks / Deep Learning
Multilayer perceptrons
Variations such as convolutional nets
37 / 37
Overview (Part II: Efros)
1
Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
Efficient indexing and retrieval methods
Decision trees
Classification and regression trees
Random Forests
3
4
Boosting
Neural networks / Deep Learning
Multilayer perceptrons
Variations such as convolutional nets
Examples and applications
37 / 37
Overview (Part II: Efros)
1
Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
Efficient indexing and retrieval methods
Decision trees
Classification and regression trees
Random Forests
3
4
Boosting
Neural networks / Deep Learning
Multilayer perceptrons
Variations such as convolutional nets
Examples and applications
Unsupervised methods
37 / 37
Overview (Part II: Efros)
1
Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
Efficient indexing and retrieval methods
Decision trees
Classification and regression trees
Random Forests
3
4
Boosting
Neural networks / Deep Learning
Multilayer perceptrons
Variations such as convolutional nets
Examples and applications
Unsupervised methods
Clustering
37 / 37
Overview (Part II: Efros)
1
Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
Efficient indexing and retrieval methods
Decision trees
Classification and regression trees
Random Forests
3
4
Boosting
Neural networks / Deep Learning
Multilayer perceptrons
Variations such as convolutional nets
Examples and applications
Unsupervised methods
Clustering
Density estimation
37 / 37
Overview (Part II: Efros)
1
Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
Efficient indexing and retrieval methods
Decision trees
Classification and regression trees
Random Forests
3
4
Boosting
Neural networks / Deep Learning
Multilayer perceptrons
Variations such as convolutional nets
Examples and applications
Unsupervised methods
Clustering
Density estimation
Dimensionality reduction
37 / 37
Overview (Part II: Efros)
1
Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
Efficient indexing and retrieval methods
Decision trees
Classification and regression trees
Random Forests
3
4
Boosting
Neural networks / Deep Learning
Multilayer perceptrons
Variations such as convolutional nets
Examples and applications
Unsupervised methods
Clustering
Density estimation
Dimensionality reduction
Applications: Collaborative filtering, etc.
37 / 37