You are on page 1of 50

Overview

INT305 Machine Learning


Lecture 2
Linear Methods for Regression, Optimization

Jimin Xiao
Department Intelligence Science
Jimin.xiao@xjtlu.edu.cn

Supervised Learning Setup Linear Regression - Model


What is Linear? 1 feature vs D features Linear Regression

Linear Regression - Loss Function Vectorization


Vectorization Vectorization

Vectorization Solving the Minimization Problem


Direct Solution I: Linear Algebra Direct Solution I: Linear Algebra

Direct Solution II: Calculus Direct Solution II: Calculus


Direct Solution II: Calculus Direct Solution II: Calculus

Feature Mapping (Basis Expansion) Polynomial Feature Mapping


Polynomial Feature Mapping with M = 0 Polynomial Feature Mapping with M = 1

Polynomial Feature Mapping with M = 3 Polynomial Feature Mapping with M = 9


Model Complexity and Generalization Model Complexity and Generalization

Regularization L2 Regularization
L2 Regularization L2 Regularized Least Squares: Ridge regression

Conclusion so far Gradient Descent


Gradient Descent Gradient Descent

Gradient Descent for Linear Regression Gradient Descent under the L2 Regularization
Learning Rate (Step Size) Training Curves

Stochastic Gradient Descent Stochastic Gradient Descent


Stochastic Gradient Descent Stochastic Gradient Descent

SGD Learning Rate Conclusion


Overview

INT305 Machine Learning


Lecture 3
Linear Classifiers, Logistic Regression, Multiclass Classification

Jimin Xiao
Department Intelligence Science
Jimin.xiao@xjtlu.edu.cn

Overview Simplifications
Examples Examples

Examples The Geometric Picture


The Geometric Picture The Geometric Picture

The Geometric Picture Summary | Binary Linear Classifiers


Towards Logistic Regression Loss Functions

Attempt 1: 0-1 loss Attempt 1: 0-1 loss


Attempt 1: 0-1 loss Attempt 2: Linear Regression

Attempt 2: Linear Regression Attempt 2: Linear Regression


Attempt 3: Logistic Activation Function Attempt 3: Logistic Activation Function

Logistic Regression Logistic Regression


Gradient Descent for Logistic Regression Gradient of Logistic Loss

Gradient Descent for Logistic Regression Multiclass Classification


Overview Multiclass Classification

Multiclass Classification Multiclass Linear Classification


Multiclass Linear Classification Softmax Regression

Softmax Regression Softmax Regression


Prove the gradient ? Limits of Linear Classification

Limits of Linear Classification Limits of Linear Classification


Next time...

INT305 Machine Learning


Lecture 4
Support Vector Machine, SVM Loss and Softmax Loss

Jimin Xiao
Department Intelligence Science
Jimin.xiao@xjtlu.edu.cn

Binary Classification with a Linear Model Zero-One Loss


Separating Hyperplanes Separating Hyperplanes

Separating Hyperplanes Separating Hyperplanes


Optimal Separating Hyperplane Geometry of Points and Planes

Geometry of Points and Planes Maximizing Margin as an Optimization Problem


Maximizing Margin as an Optimization Problem Maximizing Margin as an Optimization Problem

Maximizing Margin as an Optimization Problem Non-Separable Data Points


Maximizing Margin for Non-Separable Data Points Maximizing Margin for Non-Separable Data Points

Maximizing Margin for Non-Separable Data Points From Margin Violation to Hinge Loss
From Margin Violation to Hinge Loss Multiclass SVM Loss

Multiclass SVM Loss Multiclass SVM Loss


Multiclass SVM Loss Multiclass SVM Loss

Multiclass SVM Loss Multiclass SVM Loss


Multiclass SVM Loss Multiclass SVM Loss

Multiclass SVM Loss Softmax


Softmax Softmax

Softmax Softmax
Softmax Softmax

Softmax Softmax
Softmax Softmax

SVM & Softmax SVM & Softmax


SVM & Softmax

INT305 Machine Learning


Lecture 5
Neural Network and Back Propagation

Jimin Xiao
Department Intelligence Science
Jimin.xiao@xjtlu.edu.cn

Neural network Neural network


Neural network Neural network

Activation functions Neural network


Neural network Gradient Descent

Computational Graph Computational Graph


Computational Graph Example 1

Example 1 Example 1
Example 1 Example 1

Example 1 Example 1
Example 1 Example 1

Example 1 Example 1
Example 1 Chain rule

Chain rule Chain rule


Chain rule Chain rule

Chain rule Chain rule


Chain rule Chain rule

Chain rule Chain rule


Chain rule Chain rule

Chain rule Chain rule


Chain rule Chain rule

Chain rule Chain rule


Sigmoid Sigmoid

Pattern in backward flow Exercise 1

Pooling units take n values 𝑥 , i ∈ [1, n] and compute a scalar


output whose value is invariant to permutations of the inputs.
1. The Lp-pooling module takes positive inputs and
1
computes y = ( 𝑥 ) , assuming we know that 𝑦 = ,
𝑦
what is 𝑥 = ?
𝑥
2. The log-average module computes y=
1 1
ln( exp(𝛽𝑥 )) , assuming we know that 𝑦 = ,
𝛽 𝑦
what is 𝑥 = ?
𝑥
Gradients for vector Gradients for vector

Gradients for vector Gradients for vector


Gradients for vector Gradients for vector

Gradients for vector Gradients for vector


Gradients for vector Gradients for vector

Gradients for vector Gradients for vector


Gradients for vector Gradients for vector

Gradients for vector Gradients for vector


Gradients for vector Gradients for vector

Gradients for vector Gradients for vector


Gradients for vector

You might also like