Welcome to Scribd!

Skip carousel

Stochastic Gradient Descent: 10701 Recitations 3

Uploaded by

M Rameez Ur Rehman

0% found this document useful (0 votes)

16 views10 pages

Original Title

3_Recitation_StochasticGradientDescent.pdf

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

16 views10 pages

Stochastic Gradient Descent: 10701 Recitations 3

Uploaded by

M Rameez Ur Rehman

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 10

Search inside document

Stochastic Gradient Descent

10701 Recitations 3

Mu Li

Computer Science Department

Cargenie Mellon University

February 5, 2013
The problem

I A typical machine learning problem has a penalty/regularizer

+ loss form
n
1X
min F (w ) = g (w ) + f (w ; yi , xi ),
w n
i=1

xi , w ∈ Rp , yi ∈ R, both g and f are convex

I Today we only consider differentiable f , and let g = 0 for
simplicity
I For example, let f (w ; yi , xi ) = − log p(yi |xi , w ), we are trying
to maximize the log likelihood, which is
n
1X
max log p(yi |xi , w )
w n
i=1
Gradient Descent

I choose initial w (0) , repeat Two dimensional

example:
w (t+1) = w (t) − ηt · ∇F (w (t) )

until stop
I ηt is the learning rate, and
1X
∇F (w (t) ) = ∇w f (w (t) ; yi , xi )
n
i

I How to stop? kw (t+1) − w (t) k ≤ or

k∇F (w (t) )k ≤
Learning rate matters

too small ηt , after 100

ηt = t, it is too big
iterations
Backtracking line search

Adaptively choose the learning rate

I choose a parameter 0 < β < 1
I start with η = 1, repeat t = 0, 1, . . .
I while
η
L(w (t) − η∇L(w (t) )) > L(w (t) ) − k∇L(w (t) )k2
2
update η = βη
I w (t+1) = w (t) − η∇L(w (t) )
Backtracking line search

A typical choice β = 0.8, converged after 13 iterations:

Stochastic Gradient Descent

We name n1 i f (w ; yi , xi ) the empirical loss, the thing we

P
I
hope to minimize is the expected loss

f (w ) = Eyi ,xi f (w ; yi , xi )

I Suppose we receive an infinite stream of samples (yt , xt ) from

the distribution, one way to optimize the objective is

w (t+1) = w (t) − ηt ∇w f (w (t) ; yt , xt )

I On practice, we simulate the stream by randomly pick up

(yt , xt ) from the samples we have
Comparing the average gradient of GD n1 i ∇w f (w (t) ; yi , xi )
P
I
More about SGD

I the objective does not always decrease for each step

I comparing to GD, SGD needs more steps, but each step is
cheaper
I mini-batch, say pick up 100 samples and do average, may
accelerate the convergence
Relation to Perceptron

I Recall Perceptron: initialize w , repeat

(
yi xi if yi hw , xi i < 0
w =w+
0 otherwise

I Fix learning rate η = 1, let f (w ; y , x) = max(0, −yi hw , xi i),

then (
−yi xi if yi hw , xi i < 0
∇w f (w ; y , x) =
0 otherwise
we derive Perceptron from SGD
Question?

Chole Sky
Document6 pages
Chole Sky
navri_nalhad
No ratings yet
Backtracking
Document301 pages
Backtracking
Dinesh Reddy Kommera
100% (1)
WWW Hackingdream Net 2015 05 Hack Wifi Wpa Wpa2 Wps in Windo
Document71 pages
WWW Hackingdream Net 2015 05 Hack Wifi Wpa Wpa2 Wps in Windo
M Rameez Ur Rehman
0% (1)
Evans PDE Solution Chapter 5 Sobolev
Document8 pages
Evans PDE Solution Chapter 5 Sobolev
Hazard
100% (1)
8 - Prescriptive Analytics
Document27 pages
8 - Prescriptive Analytics
Gaurav Beniwal
No ratings yet
Data Structure LAB MANUAL 1 PDF
Document75 pages
Data Structure LAB MANUAL 1 PDF
ayush verma
No ratings yet
Lec 02 Computation Graphs
Document64 pages
Lec 02 Computation Graphs
Mr. Coffee
No ratings yet
Linear Regression and Classification: Lectures 3-4
Document41 pages
Linear Regression and Classification: Lectures 3-4
Val
No ratings yet
Support Vector Machines (SVM) : Y.H. Hu
Document25 pages
Support Vector Machines (SVM) : Y.H. Hu
Shopno Chura
No ratings yet
One Layer Neural-Nets (Perceptrons) : The Delta Rule For A Linear Unit
Document2 pages
One Layer Neural-Nets (Perceptrons) : The Delta Rule For A Linear Unit
Raghunath Siripudi
No ratings yet
Linear Regression: Volker Tresp 2017
Document25 pages
Linear Regression: Volker Tresp 2017
Sophie Strobl
No ratings yet
Lecture 3
Document7 pages
Lecture 3
She
No ratings yet
Today: - Calculus
Document61 pages
Today: - Calculus
Jose Ramon Villatuya
No ratings yet
Bagging and Boosting: 9.520 Class 10, 13 March 2006 Sasha Rakhlin
Document19 pages
Bagging and Boosting: 9.520 Class 10, 13 March 2006 Sasha Rakhlin
Akhi Danu
No ratings yet
Lecture 2
Document57 pages
Lecture 2
happy_user
No ratings yet
Unconstrained Optimization III: Topics We'll Cover
Document3 pages
Unconstrained Optimization III: Topics We'll Cover
MohammadAli
No ratings yet
Percept Ron
Document2 pages
Percept Ron
Vishesh
No ratings yet
Vademecum PROB ML
Document14 pages
Vademecum PROB ML
anon_57840914
No ratings yet
Pegasos
Document4 pages
Pegasos
Melanie2023
No ratings yet
Neural Networks: (Many Slides From Greg Durrett and Philipp Koehn)
Document65 pages
Neural Networks: (Many Slides From Greg Durrett and Philipp Koehn)
Ethan Roland
No ratings yet
ECS171: Machine Learning: Lecture 4: Optimization (LFD 3.3, SGD)
Document45 pages
ECS171: Machine Learning: Lecture 4: Optimization (LFD 3.3, SGD)
svwnerlgwr
No ratings yet
Lec 16
Document15 pages
Lec 16
Thiago Salles
No ratings yet
ECS171: Machine Learning: Cho-Jui Hsieh UC Davis
Document50 pages
ECS171: Machine Learning: Cho-Jui Hsieh UC Davis
Sam Dillinger
No ratings yet
Tutorial 8 Questions
Document3 pages
Tutorial 8 Questions
Evan Duh
No ratings yet
coursSVM Versionlongue 20
Document70 pages
coursSVM Versionlongue 20
dorian
No ratings yet
Foundations of Deep Learning
Document30 pages
Foundations of Deep Learning
Nelson Ubaldo Quispe M
No ratings yet
Dual Coordinate Descent Methods For Logistic Regression
Document35 pages
Dual Coordinate Descent Methods For Logistic Regression
Alvin
No ratings yet
ANN Backpropagation: Weight Updates For Hidden Nodes: Step 1: Update The Weights V
Document3 pages
ANN Backpropagation: Weight Updates For Hidden Nodes: Step 1: Update The Weights V
roro132
No ratings yet
Gradient Descent Based Learners
Document11 pages
Gradient Descent Based Learners
sandt
No ratings yet
Lec. 03 - An Introduction To Support Vector Machines - Giorgio Valentini U. Milano
Document42 pages
Lec. 03 - An Introduction To Support Vector Machines - Giorgio Valentini U. Milano
Alejandro Vasquez
No ratings yet
Support Vector Machines PDF
Document5 pages
Support Vector Machines PDF
Nageswararao Eluri
No ratings yet
R300 Solution Guide 2018M
Document8 pages
R300 Solution Guide 2018M
Marco Brolli
No ratings yet
Homework 4: Kernel Methods: Minted Listings
Document14 pages
Homework 4: Kernel Methods: Minted Listings
Mayur Agrawal
No ratings yet
3 DeltaRule PDF
Document10 pages
3 DeltaRule PDF
Es E
No ratings yet
SD-M1 TSI Chapitre 4
Document42 pages
SD-M1 TSI Chapitre 4
Felix Chokwe Danra Taissala
No ratings yet
MAST20005 Statistics Assignment 3
Document8 pages
MAST20005 Statistics Assignment 3
Anonymous na314kKjOA
No ratings yet
EE Exercise Solutions 2022
Document21 pages
EE Exercise Solutions 2022
ValentinaArmoniaFarinelli
No ratings yet
Online Learning: 9.520 Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das
Document33 pages
Online Learning: 9.520 Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das
Habib Mrad
No ratings yet
Assignment 1: CMSC 498L Released: Feb-14. Due Feb-21
Document2 pages
Assignment 1: CMSC 498L Released: Feb-14. Due Feb-21
Zu Ki
No ratings yet
Sol 1
Document4 pages
Sol 1
anıl
No ratings yet
Machine Learning 10-701 Final Exam May 5, 2015: Obvious Exceptions For Pacemakers and Hearing Aids
Document17 pages
Machine Learning 10-701 Final Exam May 5, 2015: Obvious Exceptions For Pacemakers and Hearing Aids
Nithin
No ratings yet
Imaging Summary
Document39 pages
Imaging Summary
lepenguin
No ratings yet
MAFE208IU-L11 ODEs Part1
Document25 pages
MAFE208IU-L11 ODEs Part1
Thy Vũ
No ratings yet
1 D'alembert's Solution: Weston Barger July 12, 2016
Document11 pages
1 D'alembert's Solution: Weston Barger July 12, 2016
ranv
No ratings yet
NNLS1 2019 HW4 Solutions
Document11 pages
NNLS1 2019 HW4 Solutions
ADSTE ETW
No ratings yet
Study Micro
Document43 pages
Study Micro
Rija Ali
No ratings yet
CS 229, Public Course Problem Set #1 Solutions: Supervised Learning
Document10 pages
CS 229, Public Course Problem Set #1 Solutions: Supervised Learning
suhar adi
No ratings yet
Prob 02
Document18 pages
Prob 02
Tuhin
No ratings yet
6.7. Ordinary Differential Equation (Ode) and PDE
Document37 pages
6.7. Ordinary Differential Equation (Ode) and PDE
Eyu Kaleb
No ratings yet
Machine Learning Exponential Family: Dmitrij Schlesinger
Document11 pages
Machine Learning Exponential Family: Dmitrij Schlesinger
Sakshi Badoni
No ratings yet
University
Document3 pages
University
Sudo27
No ratings yet
Transformations of Two Random Variables
Document9 pages
Transformations of Two Random Variables
asarisetya
No ratings yet
HW6 Solutions
Document4 pages
HW6 Solutions
Juan Carños Anaya Bohorquez
No ratings yet
Problem Set 8
Document5 pages
Problem Set 8
clouds lau
No ratings yet
Objectives of Chapter 5:, X, and
Document6 pages
Objectives of Chapter 5:, X, and
Mayank Soni
No ratings yet
UW MATH STAT395 Functions Random Variables
Document6 pages
UW MATH STAT395 Functions Random Variables
Kassawmar Ayalew
No ratings yet
Lecture3 Logistic Regression Regularization
Document39 pages
Lecture3 Logistic Regression Regularization
jiayuan0113
No ratings yet
cs229.... Machine Language. Andrew NG
Document17 pages
cs229.... Machine Language. Andrew NG
krishna
No ratings yet
Методичка Stat. Inference
Document45 pages
Методичка Stat. Inference
Yehor
No ratings yet
Support Vector Machines
Document5 pages
Support Vector Machines
S
No ratings yet
COGS 118 Homework 3 Supervised Machine Learning Algorithms
Document7 pages
COGS 118 Homework 3 Supervised Machine Learning Algorithms
Veronica Lin
No ratings yet
Adam: Adaptive Moment Estimation: The Error To Be Minimized
Document4 pages
Adam: Adaptive Moment Estimation: The Error To Be Minimized
Raghunath Siripudi
No ratings yet
A Substitute of Harmonic Majorization: - U (X, T) - P G (X)
Document10 pages
A Substitute of Harmonic Majorization: - U (X, T) - P G (X)
Alvaro Corvalan
No ratings yet
SVM Incremental Learning, Adaptation and Optimization - IJCNN 2003 Presentation
Document11 pages
SVM Incremental Learning, Adaptation and Optimization - IJCNN 2003 Presentation
Chris Diehl
No ratings yet
Green's Function Estimates for Lattice Schrödinger Operators and Applications. (AM-158)
From Everand
Green's Function Estimates for Lattice Schrödinger Operators and Applications. (AM-158)
Jean Bourgain
No ratings yet
Tensor Voting: Theory and Applications: Gérard Medioni Chi-Keung Tang Mi-Suen Lee
Document10 pages
Tensor Voting: Theory and Applications: Gérard Medioni Chi-Keung Tang Mi-Suen Lee
M Rameez Ur Rehman
No ratings yet
20 Multiscale Vessel Enhancement
Document13 pages
20 Multiscale Vessel Enhancement
M Rameez Ur Rehman
No ratings yet
Instructions Residencepermitapplication Study PHD - EnG 14-6-18
Document5 pages
Instructions Residencepermitapplication Study PHD - EnG 14-6-18
M Rameez Ur Rehman
No ratings yet
19 Detection of Electrophysiology Cathers
Document8 pages
19 Detection of Electrophysiology Cathers
M Rameez Ur Rehman
No ratings yet
Modeling and Simulation of A Moving Robot Arm Mounted On Wheelchair
Document5 pages
Modeling and Simulation of A Moving Robot Arm Mounted On Wheelchair
M Rameez Ur Rehman
No ratings yet
Bayes Optimization For Machine Learning
Document29 pages
Bayes Optimization For Machine Learning
M Rameez Ur Rehman
No ratings yet
2Pqlgluhfwlrqdo9Lvlrqedvhg3Rvh&Rqwuroehwzhhq7Zr5Rerwv:, 1 - LD DQJ/L &+ (1+DL/RQJ
Document6 pages
2Pqlgluhfwlrqdo9Lvlrqedvhg3Rvh&Rqwuroehwzhhq7Zr5Rerwv:, 1 - LD DQJ/L &+ (1+DL/RQJ
M Rameez Ur Rehman
No ratings yet
Robot Trajectory Optimization Using Approximate Inference
Document8 pages
Robot Trajectory Optimization Using Approximate Inference
M Rameez Ur Rehman
No ratings yet
xx08 SIreview PDF
Document22 pages
xx08 SIreview PDF
M Rameez Ur Rehman
No ratings yet
Convex Optimization in Image Processing: Ernie Esser
Document9 pages
Convex Optimization in Image Processing: Ernie Esser
M Rameez Ur Rehman
No ratings yet
WWW Electronicslovers Com 2015 09 Arduino Based MPPT Solar C
Document35 pages
WWW Electronicslovers Com 2015 09 Arduino Based MPPT Solar C
M Rameez Ur Rehman
No ratings yet
Phy Sops
Document171 pages
Phy Sops
M Rameez Ur Rehman
No ratings yet
Questions: Program For String Reversal Without Using Inbuilt Function
Document14 pages
Questions: Program For String Reversal Without Using Inbuilt Function
Ekta Yadav
No ratings yet
Array C/C++ Programs
Document3 pages
Array C/C++ Programs
Irene Sultana
No ratings yet
Convex Optimization in Classification Problems: MIT/ORC Spring Seminar
Document39 pages
Convex Optimization in Classification Problems: MIT/ORC Spring Seminar
Patrick O'Rourke
No ratings yet
Heba DSBook 2022
Document337 pages
Heba DSBook 2022
Ahmed Fawzy
No ratings yet
Laporan Praktikum Stack
Document3 pages
Laporan Praktikum Stack
Anis Nurul Hanifah
No ratings yet
Assignment 01
Document3 pages
Assignment 01
Nida Siddiquee
No ratings yet
Multilayer Feed Forward Neural Network
Document8 pages
Multilayer Feed Forward Neural Network
k Siddarthaprasad
No ratings yet
BFS Greedybfs Astar Search Techniques in AI Difference and Details
Document2 pages
BFS Greedybfs Astar Search Techniques in AI Difference and Details
Bhuvan Thakur
No ratings yet
Hybridization of A Genetic Algorithm With A Pattern Search Augmented Lagrangian Method
Document1 page
Hybridization of A Genetic Algorithm With A Pattern Search Augmented Lagrangian Method
Semana da Escola de Engenharia da Universidade do Minho
No ratings yet
Binary Tree Traversal
Document6 pages
Binary Tree Traversal
Senthil Pi
No ratings yet
DC CH05
Document37 pages
DC CH05
mervebayrak
No ratings yet
Hash Table: Aad Cse Srm-Ap 1
Document16 pages
Hash Table: Aad Cse Srm-Ap 1
Kushal Narendra Babu
No ratings yet
L30 - Integer Linear Programming - Branch and Bound Algorithm
Document16 pages
L30 - Integer Linear Programming - Branch and Bound Algorithm
Aryaman Mandhana
0% (1)
Chapter 3 - String Processing
Document28 pages
Chapter 3 - String Processing
Tanveer Ahmed Hakro
No ratings yet
Brute Force
Document14 pages
Brute Force
anifatul
No ratings yet
Analysis of Algorithms: CS 302 - Data Structures Section 2.6
Document48 pages
Analysis of Algorithms: CS 302 - Data Structures Section 2.6
Shoaibakhtar Akhtarkaleem
No ratings yet
Avl Trees
Document37 pages
Avl Trees
stanyatan10
No ratings yet
11+ Python Recursion Practice Problems With Solutions - Python Mania
Document10 pages
11+ Python Recursion Practice Problems With Solutions - Python Mania
Samy Bouria
No ratings yet
Lecture 8: Gradient Descent and Logistic Regression
Document39 pages
Lecture 8: Gradient Descent and Logistic Regression
Ashish Jain
No ratings yet
Breadth-First Search
Document10 pages
Breadth-First Search
Shah jalal
No ratings yet
Implementation of Stack AIM: Write A Program To Implement Stack As An Abstract Data Type Using Linked List and Use
Document57 pages
Implementation of Stack AIM: Write A Program To Implement Stack As An Abstract Data Type Using Linked List and Use
shubham gandhi
No ratings yet
Pseudo Code Lecture
Document64 pages
Pseudo Code Lecture
Mohamed Nawsath
No ratings yet
Game Tree
Document25 pages
Game Tree
Areej Ehsan
No ratings yet
Chapter 5 Stack and Queue
Document22 pages
Chapter 5 Stack and Queue
Magarsa Bedasa
No ratings yet
Direct Computation 2. Radix-2 FFT 3. Decimation-In-Time FFT 4. Flowgraphs 5. Bit Reversal Permutation 6. Complexity 7. Decimation-In-Frequency FFT
Document26 pages
Direct Computation 2. Radix-2 FFT 3. Decimation-In-Time FFT 4. Flowgraphs 5. Bit Reversal Permutation 6. Complexity 7. Decimation-In-Frequency FFT
veeramaniks408
No ratings yet
Lecture 26
Document4 pages
Lecture 26
Rishi Sahu
No ratings yet