Professional Documents
Culture Documents
CNN With Tensor Flow 1-32
CNN With Tensor Flow 1-32
Descent Optimization
Stochastic Gradient Descend (SGD)
Stochastically sample “mini-batches” from dataset D
The size of Bj can contain even just 1 sample
route”
● Assumption: Test data will be no different than training
data
For ML we cannot make this assumption ==> test data are
always different
Gradient Descent (GD) vs
Stochastic Gradient Descent (SGD)
Stochastic Gradient Descent
● SGD is preferred to Gradient Descend
● (A bit) Noisy gradients act as regularization