Professional Documents
Culture Documents
Lectures24and25 LargeScale NeuralNets PDF
Lectures24and25 LargeScale NeuralNets PDF
Today’s Agenda
n Computational Examples
n Training data:
Continuous = better
3.5
2.5
0-1 Loss
2 Logistic Loss
Hinge Loss
1.5
0.5
0
-3 -2 -1 0 1 2 3
n For example:
n is the ridge penalty
Gradients
n Recall the partial derivative of with respect
to denoted as
Optimality Conditions
n Recall our optimization problem of interest:
Gradient Descent
n Problem of Interest:
n There are a whole host of other issues that have received recent
attention: accelerating the rate of convergence, persevering
“structure”, asynchronous and/or parallel updates, …
Pros: Cons:
Key Ingredients
Input Layer
Hidden Layer
n Notation:
n Let be the weight from the ith node in the input layer to the
jth node in the hidden layer
n Let be the input to the jth node in the hidden layer
n Let be the bias term of the jth node in the hidden layer
n Sigmoid:
0.5
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
n ReLU:
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
n Threshold/Step:
1.5
0.5
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
-0.5
Activation Function
Output Layer
n Notation:
n Let be the weight from the jth node in the hidden layer
to the kth node in the output layer
n Let be the input to the kth node in the output layer
n Let be the bias term of the kth node in the output layer
n Training data:
n More notation:
n Therefore:
n Our Task:
Forward Pass
Back Propagation
Deep Networks
n Deep neural networks are simply obtained by
adding more layers
Input Hidden Output
Layer Layers Layer
n…
Convolutional Networks
Max Pooling
The bigger the window, the more general info it gets but the less local structure it picks up
IEOR 242, Fall 2019 - Lecture 24
+ 81
Convolutional Networks:
Extensions
n Of course the window size can be set however you
like
n Also different options for how to deal with the
border cases (here we used “same zero padding”)
n Most importantly, this can be extended to height x
width x channels
n The number of channels can vary throughout the network
and is also referred to as the number of filters
Keras
n ...
S5 “Suit of card #5”
C5 “Rank of card #5”
Baseline Model
LDA Model
n Accuracy history:
n Accuracy history:
CTR Prediction
Data in Theory….
y x1 x2 x3 x4 …
y1 x11 x12 x13 x14 …
y2 x21 x22 x23 x24 …
y3 x31 x32 x33 x34 …
y4 x41 x42 x43 x44 …
y5 x51 x52 x53 x54 …
…
…
IEOR 242, Fall 2019 - Lecture 24
+ 109
Data in Practice…
Data in Practice
Conclusions