Professional Documents
Culture Documents
Machine Learning Theory: Instructor
Machine Learning Theory: Instructor
Instructor
• Jun-Won Choi
• Office: 공업센터별관 303-1
• Phone: 02-2220-2316
• Office hour: meeting can be scheduled personally via email
Prerequisite
• Basic knowledge of probability theory and linear algebra
Textbook
Main textbook: Bishop, “Pattern recognition and machine
learning”
4
Deep learning
10 breakthrough technologies 2013
5
Deep learning = Deep neural network (DNN)
Deep = many hidden layers
Data
Representati
on of data
F(x)
6
Applications of deep learning
Computer vision
• Image classification, object detection, face recognition, face detection,
action recognition, scene understanding
Natural language processing, speech recognition, translation
• Natural language processing, speech recognizer, question & answer
machine, language translation, image captioning
Regression & forecasting
• Time-series analysis, financial modeling
Medical diagnosis
• Breast cancer cell mitosis detection, bio-informatics
Autonomous vehicle & robotics
• Pedestrian detection, traffic sign recognition, lane tracking, car accident
avoidance, driver behavior modeling
7
Performance of deep learning
Imagenet: Large image data set (60 million images)
Imagenet large scale visual recognition challenges (ILSVRC)
• Image classification competition with 1.2 million labeled images and 1,000
categories
• Annual challenge since 2010
9
Dark age of neural network
Most of theory has been developed in 1980-1990’s
However, neural network has not been successful
• Vanishing gradient problem
• Slow training
• Local optima
• Parameter space is highly nonconvex
• Gradient decent algorithm is easily stuck at the bad local minima
Most of machine learning research focuses on model-based learning
• Support vector machine (SVM), Gaussian mixture model (GMM), Kernel
methods
10
Renaissance of deep neural network
Unsupervised pre-training (Hinton, 2006)
• Training using back-propagation suffers from local minima and over-fitting
• Weights are pre-trained without labels (unsupervised learning)
• After pre-training, fine-tuning is performed using labels.
Restricted Bolzmann
Machine (RBM)
12
Renaissance of deep neural network
Top researchers are being hired by company
Jeffrey Hinton (Toronto)
Andrew Ng (Stanford)
13
Popular structures of deep neural network
Fully-connected deep neural network (DNN)
Reinforcement learning
14
Application of CNN: object detection
Experimental results
16
Application of reinforcement learning
AlphaGo
• Regression
Data
• Classification
Learning
Classification example [Duda] pp. 1-9
Salmon
Sea bass
• We take a picture of fish and decide whether a fish is salman or sea bass
based on the picture.
• Two fishes are different in terms of length, lightness, width, number and
shape of fins, position of the mouth, …
• There are variations in lighting and position of the fish on conveyor.
Classification example
Learning from the data
Label
• Supervised learning vs unsupervised learning
• Supervised learning: we know the right answer called “label”.
Data
Label B B S B
Data
Label S S B S
Salmon?
Feature
Classifier
extraction
Sea bass?
Classification example
Feature extraction
• Length
Histogram
• Lightness
Classification example
Feature space
800k
?
600k
Size of house
2000 3000
sqt sqt
Regression example
Curve fitting problem
• Ten data points are given.
• Data is generated from the model 2
?
Regression example
Polynomial curve fitting
• We fit the data using the polynomial function of the form
Weights of polynomial
are turned to learn
random noise
Overfitting
problem
Regression example
Overfitting problem
• With 9, we can make training error zero
• However, test error becomes very large.
Regression example
How to overcome overfitting
• More data points can solve overfitting. M=9
Conditional probability
Joint probability distribution
Probability theory
Expectation
Conditional expectation
Variance
Probability theory
Covariance
Covariance matrix
Probability theory
Bayes’ theorem Likelihood function
A prior distribution
A posteriori distribution
Observed data
• Log-likelihood function
• A posteriori distribution of
Regularization term
Probability theory
Bayesian curve fitting 2
• A posteriori distribution of given the input