MartinGorner TensorflowAndDeepLearningWithoutAPhD PDF

>TensorFlow and deep learning_
without a PhD
deep deep
Science ! Code ...
#Tensorflow @martin_gorner
Hello World: handwritten digits classification - MNIST
?
MNIST = Mixed National Institute of Standards and Technology - Download the dataset at http://yann.lecun.com/exdb/mnist/
Very simple model: softmax classification
784 pixels
28x28
pixels
... weighted sum of all
pixels + bias
softmax ...
0 1 2 9
neuron outputs
Data & Analytics 3

In matrix notation, 100 images at a time
10 columns
w0,0 w0,1 w0,2 w0,3 … w0,9
w1,0 w1,1 w1,2 w1,3 … w1,9 broadcast
w2,0 w2,1 w2,2 w2,3 … w2,9
w3,0 w3,1 w3,2 w3,3 … w3,9
784 lines
w4,0 w4,1 w4,2 w4,3 … w4,9
x w5,0 w5,1 w5,2 w5,3 … w5,9
x w6,0 w6,1 w6,2 w6,3 … w6,9
x
X : 100 images, x
x
w7,0 w7,1 w7,2 w7,3 … w7,9
w8,0 w8,1 w8,2 w8,3 … w8,9
one per line, x …
x
flattened x
w783,0 w783,1 w783,2 … w783,9
L0,0 L0,1 L0,2 L0,3 ……L0,9 + b 0 b1 b2 b3 … b9

L1,0 L1,1 L1,2 L1,3 … L1,9
L2,0 L2,1 L2,2 L2,3 … L2,9
L3,0 L3,1 L3,2 L3,3 … L3,9 + Same biases
L4,0 L4,1 L4,2 L4,3 … L4,9
… on all lines
L99,0 L99,1 L99,2 … L99,9
784 pixels
Softmax, on a batch of images
Predictions Images Weights Biases

Y[100, 10] X[100, 748] W[748,10] b[10]
broadcast
applied line matrix multiply
on all lines
by line
tensor shapes in [ ]
Data & Analytics 5

Now in TensorFlow (Python)
tensor shapes: X[100, 748] W[748,10] b[10]
Y = tf.nn.softmax(tf.matmul(X, W) + b)
broadcast
matrix multiply
on all lines
Data & Analytics 6

Success ?
0 1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 1 0 0 0
actual probabilities, “one-hot” encoded
Cross entropy:
this is a “6”
computed probabilities
0.1 0.2 0.1 0.3 0.2 0.1 0.9 0.2 0.1 0.1
0 1 2 3 4 5 6 7 8 9
Data & Analytics 7

Demo
92%
TensorFlow - initialisation
import tensorflow as tf this will become the batch size, 100
X = tf.placeholder(tf.float32, [None, 28, 28, 1])

W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10])) 28 x 28 grayscale images
init = tf.initialize_all_variables()
Training = computing variables W and b
Data & Analytics 10

TensorFlow - success metrics
flattening images
# model
Y = tf.nn.softmax(tf.matmul(tf.reshape(X, [-1, 784]), W) + b)
# placeholder for correct answers
Y_ = tf.placeholder(tf.float32, [None, 10])
“one-hot” encoded
# loss function
cross_entropy = -tf.reduce_sum(Y_ * tf.log(Y))
“one-hot” decoding
# % of correct answers found in batch
is_correct = tf.equal(tf.argmax(Y,1), tf.argmax(Y_,1))
accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))
Data & Analytics 11

TensorFlow - training
learning rate
optimizer = tf.train.GradientDescentOptimizer(0.003)
train_step = optimizer.minimize(cross_entropy)
loss function
Data & Analytics 12

TensorFlow - run !
sess = tf.Session() running a Tensorflow
sess.run(init) computation, feeding
placeholders
for i in range(1000):
# load batch of images and correct answers
batch_X, batch_Y = mnist.train.next_batch(100)
train_data={X: batch_X, Y_: batch_Y}
# train
sess.run(train_step, feed_dict=train_data)
# success ?
Tip: a,c = sess.run([accuracy, cross_entropy], feed_dict=train_data)
do this
every 100 # success on test data ?
iterations test_data={X: mnist.test.images, Y_: mnist.test.labels}
a,c = sess.run([accuracy, cross_entropy, It], feed=test_data)
TensorFlow - full python code
training step
import tensorflow as tf
initialisation optimizer = tf.train.GradientDescentOptimizer(0.003)
train_step = optimizer.minimize(cross_entropy)
X = tf.placeholder(tf.float32, [None, 28, 28, 1]) sess = tf.Session()

W = tf.Variable(tf.zeros([784, 10])) sess.run(init)
b = tf.Variable(tf.zeros([10]))
init = tf.initialize_all_variables() model for i in range(10000):
# load batch of images and correct answers
# model batch_X, batch_Y = mnist.train.next_batch(100)
Y=tf.nn.softmax(tf.matmul(tf.reshape(X,[-1, 784]), W) + b) train_data={X: batch_X, Y_: batch_Y}
# placeholder for correct answers # train

Y_ = tf.placeholder(tf.float32, [None, 10]) sess.run(train_step, feed_dict=train_data) Run
# loss function
success metrics # success ? add code to print it
cross_entropy = -tf.reduce_sum(Y_ * tf.log(Y)) a,c = sess.run([accuracy, cross_entropy], feed=train_data)
# % of correct answers found in batch # success on test data ?

is_correct = tf.equal(tf.argmax(Y,1), tf.argmax(Y_,1)) test_data={X:mnist.test.images, Y_:mnist.test.labels}
accuracy = tf.reduce_mean(tf.cast(is_correct,tf.float32)) a,c = sess.run([accuracy, cross_entropy], feed=test_data)
Data & Analytics 14

Cookbook
Softmax
Cross-entropy
Mini-batch
ML
e e p !
|
Go d
|
|
|
Let’s try 5 fully-connected layers !
overkill
;-)
784
200
100
sigmoïd function
60
30
10 softmax
0 1 2 ... 9
Data & Analytics 17

TensorFlow - initialisation
K = 200
L = 100 weights initialised
M = 60 with random values
N = 30
W1 = tf.Variable(tf.truncated_normal([28*28, K] ,stddev=0.1))
B1 = tf.Variable(tf.zeros([K])
W2 = tf.Variable(tf.truncated_normal([K, L], stddev=0.1))

B2 = tf.Variable(tf.zeros([L])
W3 = tf.Variable(tf.truncated_normal([L, M], stddev=0.1))
B3 = tf.Variable(tf.zeros([M])
W4 = tf.Variable(tf.truncated_normal([M, N], stddev=0.1))
B4 = tf.Variable(tf.zeros([N])
W5 = tf.Variable(tf.truncated_normal([N, 10], stddev=0.1))
B5 = tf.Variable(tf.zeros([10]))
Data & Analytics 18

TensorFlow - the model
weights and biases
X = tf.reshape(X, [-1, 28*28])
Y1 = tf.nn.sigmoid(tf.matmul(X, W1) + B1)

Y2 = tf.nn.sigmoid(tf.matmul(Y1, W2) + B2)
Y = tf.nn.softmax(tf.matmul(Y4, W5) + B5)
Data & Analytics 19

Demo - slow start ?
Data & Analytics 20

Relu !
RELU
RELU = Rectified Linear Unit
Y = tf.nn.relu(tf.matmul(X, W) + b)
Data & Analytics 22
RELU
Data & Analytics 23

Demo - noisy accuracy curve ?
yuck!
Slow down . . .
Learning
rate decay
Learning rate decay
Learning rate 0.003 at start then dropping exponentially to 0.0001

Data & Analytics 26
Demo - dying neurons
Dy
ing
...
Data & Analytics 27

Dropout
Dropout
pkeep =
tf.placeholder(tf.float32) TRAINING EVALUATION
pkeep=0.75 pkeep=1
Yf = tf.nn.relu(tf.matmul(X, W) + B)
Y = tf.nn.dropout(Yf, pkeep)
Data & Analytics 29
Dropout
Dy
ing
...
D
ea
d
with dropout
Data & Analytics 30
Demo
98%
Data & Analytics 32
All the party tricks
98.2%
97.9%
sustained
peak
Sigmoid,decaying
RELU, learning
learningrate
learning
rate= =0.003
0.003
rate 0.003 -> 0.0001 and dropout 0.75
Data & Analytics 33

Overfitting
Cross-entropy loss
Overfitting
Data & Analytics 34

Overfitting Too many
neurons
?!?
BAD
Network
Not
enough
DATA
Convolutional layer
convolutional
subsampling
+padding convolutional
subsampling
convolutional
subsampling
stride
W1[4, 4, 3]
W2[4, 4, 3] W[4, 4, 3, 2]
filter input output
size channels channels
Data & Analytics 36

Hacker’s tip
ALL
Convolu-
tional
Convolutional neural network
+ biases on
all layers
28x28x1
convolutional layer, 4 channels
W1[5, 5, 1, 4] stride 1
28x28x4
14x14x8 W2[4, 4, 4, 8] stride 2

7x7x12
W3[4, 4, 8, 12] stride 2
200 fully connected layer W4[7x7x12, 200]

10 softmax readout layer W5[200, 10]
Tensorflow - initialisation
filter input output
K=4
size channels channels
L=8
M=12
W1 = tf.Variable(tf.truncated_normal([5, 5, 1, K] ,stddev=0.1))
B1 = tf.Variable(tf.ones([K])/10)
W2 = tf.Variable(tf.truncated_normal([5, 5, K, L] ,stddev=0.1))
B2 = tf.Variable(tf.ones([L])/10)
W3 = tf.Variable(tf.truncated_normal([4, 4, L, M] ,stddev=0.1))
B3 = tf.Variable(tf.ones([M])/10)
weights initialised
N=200 with random values
W4 = tf.Variable(tf.truncated_normal([7*7*M, N] ,stddev=0.1))
B4 = tf.Variable(tf.ones([N])/10)
W5 = tf.Variable(tf.truncated_normal([N, 10] ,stddev=0.1))
B5 = tf.Variable(tf.zeros([10])/10)
Tensorflow - the model
input image batch weights stride biases

X[100, 28, 28, 1]
Y1 = tf.nn.relu(tf.nn.conv2d(X, W1, strides=[1, 1, 1, 1], padding='SAME') + B1)

Y2 = tf.nn.relu(tf.nn.conv2d(Y1, W2, strides=[1, 2, 2, 1], padding='SAME') + B2)
Y3 = tf.nn.relu(tf.nn.conv2d(Y2, W3, strides=[1, 2, 2, 1], padding='SAME') + B3)
YY = tf.reshape(Y3, shape=[-1, 7 * 7 * M])

Y3 [100, 7, 7, 12]
flatten all values for
Y4 = tf.nn.relu(tf.matmul(YY, W4) + B4) fully connected layer
YY [100, 7x7x12]
Y = tf.nn.softmax(tf.matmul(Y4, W5) + B5)
Data & Analytics 40

Demo
98.9%
Data & Analytics 42

WTFH ???
???
Data & Analytics 43

Bigger convolutional network + dropout
+ biases on
all layers
28x28x1
28x28x6 W1[6, 6, 1, 6] stride 1

14x14x12 W2[5, 5, 6, 12] stride 2
7x7x24 convolutional layer, 24 channels

W3[4, 4, 12, 24] stride 2
+DROPOUT
200 fully connected layer W4[7x7x24, 200]
p=0.75
10 softmax readout layer W5[200, 10]
Demo
99.3%
Data & Analytics 46

YEAH !
with dropout
Data & Analytics 47
Learning
Relu !
rate decay
Dropout
Softmax
Cross-entropy ALL
Mini-batch Convolu-
Go deep
Overfitting Too many

tional ?!?
neurons
|
|
BAD
Network
Not
enough
DATA
Cartoon images copyright: alexpokusay / 123RF stock photos
Have fun !
- cloud.google.com
All code snippets are on

Cloud ML ALPHA GitHub:
github.com/martin-
your TensorFlow models gorner/tensorflow-
trained in Google’s cloud, mnist-tutorial
fast.
This presentation:
Pre-trained models:
goo.gl/pHeXe7
tensorflow.org Cloud Vision API
Cloud Speech API ALPHA

Martin Görner
Google Developer relations
That’s all
Google Translate API
@martin_gorner folks...
Data & Analytics 49
plus.google.com/+MartinGorner
Workshop
Keyboard shortcuts for the

visualisation GUI:
1 ......... display 1st graph only

2 ......... display 2nd graph only
3 ......... display 3rd graph only
4 ......... display 4th graph only
7 ......... display graphs 1 and 2
ESC or 0 .. back to displaying all graphs
SPACE ..... pause/resume

O ......... box zoom mode (then use mouse)
H ......... reset all zooms
Ctrl-S .... save current image
Data & Analytics 50

Starter code and solutions:
Workshop github.com/martin-gorner/tensorflow-mnist-tutorial
1. Theory (sit back and listen) 6. Practice
Replace all your sigmoids with RELUs. Test. Then add
Softmax classifier, mini-batch, cross-entropy and how to learning rate decay from 0.003 to 0.0001 using the
implement them in Tensorflow (slides 1-14) formula lr = lrmin+(lrmax-lrmin)*exp(-i/2000).
Solution in: mnist_2.1_five_layers_relu_lrdecay.py
2. Practice 7. Practice (if time allows)

Open file: mnist_1.0_softmax.py
Run it, play with the visualisations (see instructions on Add dropout on all layers using a value between 0.5 and
previous slide), read and understand the code as well as 0.8 for pkeep.
the basic structure of a Tensorflow program. Solution in: mnist_2.2_five_layers_relu_lrdecay_dropout.py
3. Theory (sit back and listen) 8. Theory (sit back and listen)
Hidden layers, sigmoid activation function (slides 16-19) Convolutional networks (slides 36-42)
4. Practice 9. Practice
Start from the file you have and add one or two hidden
layers. Use cross_entropy_with_logits to avoid Replace your model with a convolutional network,
numerical instabilities with log(0). without dropout.
Solution in: mnist_2.0_five_layers_sigmoid.py Solution in: mnist_3.0_convolutional.py
5. Theory (sit back and listen) 10. Practice (if time allows)
The neural network toolbox: RELUs, learning rate decay, Try a bigger neural network (good hyperparameters on
dropout, overfitting (slides 20-35) slide 44) and add dropout on the last layer.
Solution in: mnist_3.0_convolutional_bigger_dropout.py

MartinGorner TensorflowAndDeepLearningWithoutAPhD PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MartinGorner TensorflowAndDeepLearningWithoutAPhD PDF

Uploaded by

Copyright:

Available Formats

>TensorFlow and deep learning_

Data & Analytics 3

L0,0 L0,1 L0,2 L0,3 ……L0,9 + b 0 b1 b2 b3 … b9

Predictions Images Weights Biases

Data & Analytics 5

tensor shapes: X[100, 748] W[748,10] b[10]

Data & Analytics 6

actual probabilities, “one-hot” encoded

Data & Analytics 7

import tensorflow as tf this will become the batch size, 100

X = tf.placeholder(tf.float32, [None, 28, 28, 1])

Training = computing variables W and b

Data & Analytics 10

Data & Analytics 11

Data & Analytics 12

X = tf.placeholder(tf.float32, [None, 28, 28, 1]) sess = tf.Session()

# placeholder for correct answers # train

# % of correct answers found in batch # success on test data ?

Data & Analytics 14

Data & Analytics 17

W2 = tf.Variable(tf.truncated_normal([K, L], stddev=0.1))

Data & Analytics 18

weights and biases

X = tf.reshape(X, [-1, 28*28])

Y1 = tf.nn.sigmoid(tf.matmul(X, W1) + B1)

Data & Analytics 19

Data & Analytics 20

Data & Analytics 23

Learning rate 0.003 at start then dropping exponentially to 0.0001

Data & Analytics 27

Data & Analytics 33

Data & Analytics 34

Data & Analytics 36

convolutional layer, 12 channels

200 fully connected layer W4[7x7x12, 200]

input image batch weights stride biases

Y1 = tf.nn.relu(tf.nn.conv2d(X, W1, strides=[1, 1, 1, 1], padding='SAME') + B1)

YY = tf.reshape(Y3, shape=[-1, 7 * 7 * M])

Data & Analytics 40

Data & Analytics 42

Data & Analytics 43

convolutional layer, 12 channels

7x7x24 convolutional layer, 24 channels

Data & Analytics 46

Overfitting Too many

All code snippets are on

tensorflow.org Cloud Vision API

Cloud Speech API ALPHA

Keyboard shortcuts for the

1 ......... display 1st graph only

SPACE ..... pause/resume

Data & Analytics 50

2. Practice 7. Practice (if time allows)

You might also like