Professional Documents
Culture Documents
without a PhD
deep deep
Science ! Code ...
#Tensorflow @martin_gorner
Hello World: handwritten digits classification - MNIST
?
MNIST = Mixed National Institute of Standards and Technology - Download the dataset at http://yann.lecun.com/exdb/mnist/
Very simple model: softmax classification
784 pixels
28x28
pixels
... weighted sum of all
pixels + bias
softmax ...
0 1 2 9
neuron outputs
784 lines
w4,0 w4,1 w4,2 w4,3 … w4,9
x w5,0 w5,1 w5,2 w5,3 … w5,9
x w6,0 w6,1 w6,2 w6,3 … w6,9
x
X : 100 images, x
x
w7,0 w7,1 w7,2 w7,3 … w7,9
w8,0 w8,1 w8,2 w8,3 … w8,9
one per line, x …
x
flattened x
w783,0 w783,1 w783,2 … w783,9
broadcast
applied line matrix multiply
on all lines
by line
tensor shapes in [ ]
Y = tf.nn.softmax(tf.matmul(X, W) + b)
broadcast
matrix multiply
on all lines
Cross entropy:
this is a “6”
computed probabilities
0.1 0.2 0.1 0.3 0.2 0.1 0.9 0.2 0.1 0.1
0 1 2 3 4 5 6 7 8 9
init = tf.initialize_all_variables()
learning rate
optimizer = tf.train.GradientDescentOptimizer(0.003)
train_step = optimizer.minimize(cross_entropy)
loss function
# train
sess.run(train_step, feed_dict=train_data)
# success ?
Tip: a,c = sess.run([accuracy, cross_entropy], feed_dict=train_data)
do this
every 100 # success on test data ?
iterations test_data={X: mnist.test.images, Y_: mnist.test.labels}
a,c = sess.run([accuracy, cross_entropy, It], feed=test_data)
TensorFlow - full python code
training step
import tensorflow as tf
initialisation optimizer = tf.train.GradientDescentOptimizer(0.003)
train_step = optimizer.minimize(cross_entropy)
Softmax
Cross-entropy
Mini-batch
ML
e e p !
|
Go d
|
|
|
Let’s try 5 fully-connected layers !
overkill
;-)
784
200
100
sigmoïd function
60
30
10 softmax
0 1 2 ... 9
K = 200
L = 100 weights initialised
M = 60 with random values
N = 30
W1 = tf.Variable(tf.truncated_normal([28*28, K] ,stddev=0.1))
B1 = tf.Variable(tf.zeros([K])
Y = tf.nn.relu(tf.matmul(X, W) + b)
Data & Analytics 22
RELU
yuck!
Slow down . . .
Learning
rate decay
Learning rate decay
Dy
ing
...
pkeep =
tf.placeholder(tf.float32) TRAINING EVALUATION
pkeep=0.75 pkeep=1
Yf = tf.nn.relu(tf.matmul(X, W) + B)
Y = tf.nn.dropout(Yf, pkeep)
Data & Analytics 29
Dropout
Dy
ing
...
D
ea
d
with dropout
Data & Analytics 30
Demo
98%
Data & Analytics 32
All the party tricks
98.2%
97.9%
sustained
peak
Sigmoid,decaying
RELU, learning
learningrate
learning
rate= =0.003
0.003
rate 0.003 -> 0.0001 and dropout 0.75
Overfitting
BAD
Network
Not
enough
DATA
Convolutional layer
convolutional
subsampling
+padding convolutional
subsampling
convolutional
subsampling
stride
W1[4, 4, 3]
W2[4, 4, 3] W[4, 4, 3, 2]
filter input output
size channels channels
ALL
Convolu-
tional
Convolutional neural network
+ biases on
all layers
28x28x1
convolutional layer, 4 channels
W1[5, 5, 1, 4] stride 1
28x28x4
convolutional layer, 8 channels
14x14x8 W2[4, 4, 4, 8] stride 2
W1 = tf.Variable(tf.truncated_normal([5, 5, 1, K] ,stddev=0.1))
B1 = tf.Variable(tf.ones([K])/10)
W2 = tf.Variable(tf.truncated_normal([5, 5, K, L] ,stddev=0.1))
B2 = tf.Variable(tf.ones([L])/10)
W3 = tf.Variable(tf.truncated_normal([4, 4, L, M] ,stddev=0.1))
B3 = tf.Variable(tf.ones([M])/10)
weights initialised
N=200 with random values
W4 = tf.Variable(tf.truncated_normal([7*7*M, N] ,stddev=0.1))
B4 = tf.Variable(tf.ones([N])/10)
W5 = tf.Variable(tf.truncated_normal([N, 10] ,stddev=0.1))
B5 = tf.Variable(tf.zeros([10])/10)
Tensorflow - the model
???
with dropout
Data & Analytics 47
Learning
Relu !
rate decay
Dropout
Softmax
Cross-entropy ALL
Mini-batch Convolu-
Go deep
Pre-trained models:
goo.gl/pHeXe7
3. Theory (sit back and listen) 8. Theory (sit back and listen)
Hidden layers, sigmoid activation function (slides 16-19) Convolutional networks (slides 36-42)
4. Practice 9. Practice
Start from the file you have and add one or two hidden
layers. Use cross_entropy_with_logits to avoid Replace your model with a convolutional network,
numerical instabilities with log(0). without dropout.
Solution in: mnist_2.0_five_layers_sigmoid.py Solution in: mnist_3.0_convolutional.py
5. Theory (sit back and listen) 10. Practice (if time allows)
The neural network toolbox: RELUs, learning rate decay, Try a bigger neural network (good hyperparameters on
dropout, overfitting (slides 20-35) slide 44) and add dropout on the last layer.
Solution in: mnist_3.0_convolutional_bigger_dropout.py