Professional Documents
Culture Documents
Thushan Ganegedara
June 6th, 2018
TensorBoard Tutorial
Visualize the training parameters, metrics, hyperparameters or
any statistics of your neural network with TensorBoard!
This tutorial will guide you on how to use TensorBoard, which is an amazing
utility that allows you to visualize data and how it behaves. You will see for what
sort of purposes you can use it when training a neural network.
• First, you will learn how to start TensorBoard, followed by an overview of the
different views offered.
• Next, you will see how you can visualize scalar values produced during
computations. You will also learn how to get insights from the model to fix any
potential errors in the learning.
• Thereafter, you will investigate how you can visualize vectors or collections of
data as histograms.
• With this view you will compare how weight initialization of the neural
network affects the weight update of the neural network during the learning.
Tip: check out DataCamp's Deep Learning course with Keras here.
Before you get started, make sure to import the following libraries to run the
code successfully:
from pandas_datareader import data
import matplotlib.pyplot as plt
import pandas as pd
import datetime as dt
import os
import numpy as np
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
Starting TensorBoard
To visualize things via TensorBoard, you first need to start its service. For that,
3. If you are using Python virtuanenv, activate the virtual environment you have
installed TensorFlow in
4. Make sure that you can see the TensorFlow library through Python. For that,
◦ Type in python3 , you will get a >>> looking prompt
5. Exit the Python prompt (that is, >>> ) by typing exit() and type in the
following command
◦ tensorboard --logdir=summaries
◦ Files that TensorBoard saves data into are called event files
◦ Type of data saved into the event files is called summary data
Note: TensorBoard does not like to see multiple event files in the same directory.
This can lead to you getting very gruesome curves on the display. So you should
create a separate folder for each different example (for example,
summaries/first, summaries/second, ...) to save data. Another thing to keep in
mind is that, if you want to re-run an experiment (that is, saving an event file to
an already populated folder), you have to make sure to first delete the existing
event files.
• Graph - Visualize the computational graph of your model, such as the neural
network model.
• Distributions - Visualize how data changes over time, such as the weights of a
neural network.
For example, if you carelessly initialize weights of a deep neural network to have
a very large variance between weights, your model will quickly diverge and
collapse. On the other hand, things can go wrong even when you are quite
competent in taming neural networks to make use of them. For example, not
paying attention to the learning rate can lead to either the divergence of the
model or pre-maturely saturating to sub-optimal performance.
One way to quickly detect problems with your model is to have a graphical
visualization of what's going on in your model in real time (for example, every
100 iterations). So if your model is behaving oddly, it will be clearly visible. That
is exactly what TensorBoard provides you with. You can decide which values
need to be displayed and it will maintain a real time visualization of those values
during learning.
You start by first creating a five-layer neural network that you will use to classify
hand-written digit images. For that you will use the famous MNIST dataset.
TensorFlow provides a simple API to load MNIST data, so you don't have to
manually download it. Before that you define a simple method (that is,
accuracy() ), which calculates the accuracy of some predictions with respect
def accuracy(predictions,labels):
'''
Accuracy of a given set of predictions of size (N x n_classes) and
return np.sum(np.argmax(predictions,axis=1)==np.argmax(labels,axis=1))*100.0/labels.shape[
First, define a batch_size denoting the amount of data you sample at a single
optimization/validation or testing step. Then you define the layer_ids , which
gives an identifier for each of the layers of the neural network you will be
defining. You then can define layer_sizes .
MNIST has images of size 28x28, which will be 784 when unwrapped to a single
dimension. Then you can define the input and label placeholders, that you will
later use to train the model. Finally, you define two TensorFlow variables for
each layer (that is, weights and bias ).
You can use variable scoping (more information here) so that the variables will
be nicely named and will be much easier to access later.
batch_size = 100
layer_ids = ['hidden1','hidden2','hidden3','hidden4','hidden5','out']
layer_sizes = [784, 500, 400, 300, 200, 100, 10]
tf.reset_default_graph()
# Inputs and Labels
with tf.variable_scope(lid):
w = tf.get_variable('weights',shape=[layer_sizes[idx], layer_sizes[idx+1]],
initializer=tf.truncated_normal_initializer(stddev=0.05
b = tf.get_variable('bias',shape= [layer_sizes[idx+1]],
initializer=tf.random_uniform_initializer(-0.1,0.1))
With the input/output placeholders, weights and biases of each layer defined,
you now can define the calculations to calculate the logits of the neural network.
Logits are the unnormalized values produced in the last layer of the neural
network. When normalized, you call them predictions. This involves iterating
through each layer in the neural network and computing tf.matmul(h,w) +b .
You also need to apply an activation function like
tf.nn.relu(tf.matmul(h,w) +b) for all layers except for the last one.
Next, you define the loss function that is used to optimize the neural network. In
this example, you can use the cross entropy loss, which often delivers better
results in classification problems than the mean squared error.
Finally, you will need to define an optimizer that takes in the loss and updates
the weights of the neural network in the direction that minimizes the loss.
# Calculating Logits
h = train_inputs
for lid in layer_ids:
with tf.variable_scope(lid,reuse=True):
w, b = tf.get_variable('weights'), tf.get_variable('bias')
if lid != 'out':
h = tf.nn.relu(tf.matmul(h,w)+b,name=lid+'_output')
else:
h = tf.nn.xw_plus_b(h,w,b,name=lid+'_output')
# Calculating Loss
# Optimizer
tf_learning_rate = tf.placeholder(tf.float32, shape=None, name='learning_rate')
optimizer = tf.train.MomentumOptimizer(tf_learning_rate,momentum=0.9)
grads_and_vars = optimizer.compute_gradients(tf_loss)
tf_loss_minimize = optimizer.minimize(tf_loss)
Defining Summaries
Here you can define the tf.summary objects. These objects are the type of
entities understood by TensorBoard. This means that whatever value you'd like
to be displayed, you should encapsulate as a tf.summary object.
There are several different types of summaries. Here, as you are visualizing only
scalars, you can define tf.summary.scalar objects. Furthermore, you can use
tf.name_scope to group scalars on the board. That is, scalars having the same
name scope will be displayed on the same row. Here you define three different
summaries.
# Summaries having the same name_scope will be displayed on the same row
with tf.name_scope('performance'):
# Summaries need to be displayed
# Whenever you need to record the loss, feed the mean loss to this placeholder
tf_loss_ph = tf.placeholder(tf.float32,shape=None,name='loss_summary')
# Whenever you need to record the loss, feed the mean test accuracy to this placeholder
tf_last_grad_norm = tf.sqrt(tf.reduce_mean(g**2))
tf_gradnorm_summary = tf.summary.scalar('grad_norm', tf_last_grad_norm)
break
# Merge all summaries together
performance_summaries = tf.summary.merge([tf_loss_summary,tf_accuracy_summary])
Executing the neural network: Loading Data, Training, Validation and Testing
In the code below you do the following. First, you create a session, in which you
execute the operations you defined above. Then, you create a folder for saving
summary data. Next, you create a summary writer summ_writer . You can now
initialize all variables. This will be followed by loading the MNIST dataset.
Then, for each epoch, and each batch in the training data (that is, each iteration),
execute gradnorm_summary if it is the first iteration and write
gradnorm_summary to the event file with the summary writer. You now execute
the model optimization and loss calculation. After you go through the full
training dataset for a single epoch, calculate the average training loss.
You follow a similar treatment for the validation dataset as well. Specifically, for
each batch in the validation data, you calculate the validation accuracy.
Thereafter, calculate the average validation accuracy for full validation set.
Finally, the testing phase is executed. In this, for each batch in the test data, you
calculate test accuracy for each batch. With that, you calculate the average test
accuracy for the full test set. At the very end you execute
performance_summaries and write them to the event file with the summary
writer.
image_size = 28
n_channels = 1
n_classes = 10
n_train = 55000
n_valid = 5000
n_test = 10000
n_epochs = 25
config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allow_growth = True
config.gpu_options.per_process_gpu_memory_fraction = 0.9 # making sure Tensorflow doesn't overf
session = tf.InteractiveSession(config=config)
if not os.path.exists('summaries'):
os.mkdir('summaries')
if not os.path.exists(os.path.join('summaries','first')):
os.mkdir(os.path.join('summaries','first'))
summ_writer = tf.summary.FileWriter(os.path.join('summaries','first'), session.graph)
tf.global_variables_initializer().run()
accuracy_per_epoch = []
for i in range(n_train//batch_size):
if i == 0:
# Only for the first epoch, get the summary data
train_labels: batch[1],
tf_learning_rate: 0.0001})
summ_writer.add_summary(gn_summ, epoch)
else:
loss_per_epoch.append(l)
valid_images,valid_labels = mnist_data.validation.next_batch(batch_size)
valid_batch_predictions = session.run(
tf_predictions,feed_dict={train_inputs: valid_images.reshape(batch_size,image_size*
valid_accuracy_per_epoch.append(accuracy(valid_batch_predictions,valid_labels))
mean_v_acc = np.mean(valid_accuracy_per_epoch)
accuracy_per_epoch = []
for i in range(n_test//batch_size):
tf_predictions,feed_dict={train_inputs: test_images.reshape(batch_size,image_size*i
)
accuracy_per_epoch.append(accuracy(test_batch_predictions,test_labels))
# Write the obtained summaries to the file, so it can be displayed in the TensorBoard
summ_writer.add_summary(summ, epoch)
session.close()
Extracting MNIST_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Average loss in epoch 0: 2.30252
...
...
...
First, you will see what the computational graph of your model looks like. You
can access this view by clicking on the Graphs view on in TensorBoard. It should
look like the image below. You can see that you have a nice flow from
train_inputs to loss and predictions flowing through the hidden layers 1
to 5.
Visualize the Summary Data
MNIST classification is one of the simplest examples, and it still cannot be solved
with a 5 layer neural network. For MNIST, it's not difficult to achieve an accuracy
of more than 90% in less than 5 epochs.
You can see that the accuracy is going up, but very slowly, and that the gradient
updates are increasing over time. This is an odd behavior. If you're reaching
towards convergence, you should see the gradients diminishing (approaching
zero), not increasing. But because the accuracy is going up, you're on the right
path. You probably need a higher learning rate.
You can now try a learning rate of 0.01 . This is almost identical to the previous
execution of the neural network, except that you will be using 0.01 instead of
0.0001 . Instead of tf_learning_rate: 0.0001 , use
tf_learning_rate: 0.01 . Beware that there are two instances in which you
will need to replace the argument.
You can now see that the accuracy starts close to 100 and continues to go up.
And you can see that the gradient updates are also diminishing over time and
approaching zero. Things seems much better with the learning rate of 0.01 .
Next, let's move beyond scalars. You will see how you can analyze vectors of
scalars and collections of scalars.
Beyond Scalars: Visualizing Histograms/Distributions
You saw the benefit of visualizing scalars through TensorBoard, which allowed
you to see how the model behaves and fix any potential issues with the model.
Moreover, visualizing the graph allowed you to see that there is an
uninterrupted link from the inputs to the predictions, which is necessary for
gradient calculations.
You will see how weights change in the example. If you look at the code, it uses a
truncated_normal_initializer() to initialize weights.
Here you again define the tf.summary objects. However, now you are
visualizing vectors of scalars so you need to define tf.summary.histogram
objects.
In this case, you define two histogram objects (namely, tf_w_hist and
tf_b_hist ) that contain weights and biases of a given layer. You will define
such histogram objects for all the layers and each layer will have its own name
scope.
all_summaries = []
with tf.name_scope(lid+'_hist'):
with tf.variable_scope(lid,reuse=True):
tf_param_summaries = tf.summary.merge(all_summaries)
This step is almost the same as what you did before, but here you have few
additional lines to compute the histogram summaries (that is,
tf_param_summaries ).
image_size = 28
n_channels = 1
n_classes = 10
n_train = 55000
n_valid = 5000
n_test = 10000
n_epochs = 25
config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allow_growth = True
config.gpu_options.per_process_gpu_memory_fraction = 0.9 # making sure Tensorflow doesn't overf
session = tf.InteractiveSession(config=config)
if not os.path.exists('summaries'):
os.mkdir('summaries')
if not os.path.exists(os.path.join('summaries','third')):
os.mkdir(os.path.join('summaries','third'))
tf.global_variables_initializer().run()
accuracy_per_epoch = []
loss_per_epoch = []
for i in range(n_train//batch_size):
summ_writer_3.add_summary(gn_summ, epoch)
summ_writer_3.add_summary(wb_summ, epoch)
else:
# Optimize with training data
l,_ = session.run([tf_loss,tf_loss_minimize],
feed_dict={train_inputs: batch[0].reshape(batch_size,image_size*i
train_labels: batch[1],
tf_learning_rate: 0.01})
loss_per_epoch.append(l)
print('Average loss in epoch %d: %.5f'%(epoch,np.mean(loss_per_epoch)))
avg_loss = np.mean(loss_per_epoch)
valid_accuracy_per_epoch = []
for i in range(n_valid//batch_size):
valid_images,valid_labels = mnist_data.validation.next_batch(batch_size)
valid_batch_predictions = session.run(
tf_predictions,feed_dict={train_inputs: valid_images.reshape(batch_size,image_size*
valid_accuracy_per_epoch.append(accuracy(valid_batch_predictions,valid_labels))
mean_v_acc = np.mean(valid_accuracy_per_epoch)
print('\tAverage Valid Accuracy in epoch %d: %.5f'%(epoch,np.mean(valid_accuracy_per_epoch)
accuracy_per_epoch = []
for i in range(n_test//batch_size):
test_images, test_labels = mnist_data.test.next_batch(batch_size)
test_batch_predictions = session.run(
tf_predictions,feed_dict={train_inputs: test_images.reshape(batch_size,image_size*i
)
accuracy_per_epoch.append(accuracy(test_batch_predictions,test_labels))
session.close()
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
...
...
...
Here's what your weights and biases look like. First, you have 3 axes; time (x-
axis), value (y-axis) and frequency/density of values (z-axis). Darker histograms
represent older data and lighter histograms represent newer data. A higher
value on the z axis means that the vector contains more values near that specific
value.
Note: you also have an "overlay" view of the histograms over time as well. You
can change the type of display on the left side option panel.
The Effect of Different Initializers
This is because instead of using a user defined standard deviation (as you did
when using the truncated_normal_initializer() ), Xavier initialization
automatically decides the standard deviation based on the number of input and
output connections to a layer. This helps to flow gradients from top to bottom
without issues like vanishing gradient. You then define the model again.
First, you define a batch_size denoting the amount of data you sample at a
single optimization/validation or testing step. You can then define the
layer_ids , which give an identifier for each of the layers of the neural network
you will be defining.
You can then define layer_sizes . Note that len(layer_sizes) should be
len(layer_ids)+1 , because layer_sizes includes the size of the input at the
beginning. MNIST has images of size 28x28, which will be 784 when unwrapped
to a single dimension.
Then, you can define the input and label placeholders, which you will later use to
train the model. Finally, you define two TensorFlow variables for each layer (that
is, weights and bias ).
Note: This is identical to the code you used first time, except for the
initialization technique used for the weights
batch_size = 100
layer_ids = ['hidden1','hidden2','hidden3','hidden4','hidden5','out']
tf.reset_default_graph()
with tf.variable_scope(lid):
w = tf.get_variable('weights',shape=[layer_sizes[idx], layer_sizes[idx+1]],
initializer=tf.contrib.layers.xavier_initializer())
b = tf.get_variable('bias',shape= [layer_sizes[idx+1]],
initializer=tf.random_uniform_initializer(-0.1,0.1))
Calculating Logits, Predictions, Loss and Optimization
With the input/output placeholders, weights and biases of each layer defined,
you now can define the calculations to calculate the logits of the neural network
again.
Note: This part is identical to the code you used the first time you defined these
operations and tensors.
Define Summaries
Here you can define the tf.summary objects again. This is also identical to the
code you used the first time you defined these operations and tensors.
Here you again define the tf.summary objects. However, you now are
visualizing vectors of scalars so you need to define tf.summary.histogram
objects.
Note that this is identical to the code you used the first time you defined these
operations and tensors.
Note that this is the same as what you did before in the previous section!
There are only a few bits of code that you need to change: the three occurrences
of os.path.join('summaries','third') to
os.path.join('summaries','fourth') , summ_writer_3 to summ_writer_4
(this appears 4 times) and the tf_learning_rate of 0.00001 has to be set to
0.01 .
image_size = 28
n_channels = 1
n_classes = 10
n_train = 55000
n_valid = 5000
n_test = 10000
n_epochs = 25
config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allow_growth = True
session = tf.InteractiveSession(config=config)
if not os.path.exists('summaries'):
os.mkdir('summaries')
if not os.path.exists(os.path.join('summaries','fourth')):
os.mkdir(os.path.join('summaries','fourth'))
tf.global_variables_initializer().run()
accuracy_per_epoch = []
mnist_data = input_data.read_data_sets('MNIST_data', one_hot=True)
summ_writer_4.add_summary(gn_summ, epoch)
summ_writer_4.add_summary(wb_summ, epoch)
else:
l,_ = session.run([tf_loss,tf_loss_minimize],
feed_dict={train_inputs: batch[0].reshape(batch_size,image_size*i
train_labels: batch[1],
tf_learning_rate: 0.01})
loss_per_epoch.append(l)
avg_loss = np.mean(loss_per_epoch)
for i in range(n_valid//batch_size):
valid_images,valid_labels = mnist_data.validation.next_batch(batch_size)
valid_batch_predictions = session.run(
tf_predictions,feed_dict={train_inputs: valid_images.reshape(batch_size,image_size*
valid_accuracy_per_epoch.append(accuracy(valid_batch_predictions,valid_labels))
mean_v_acc = np.mean(valid_accuracy_per_epoch)
print('\tAverage Valid Accuracy in epoch %d: %.5f'%(epoch,np.mean(valid_accuracy_per_epoch)
tf_predictions,feed_dict={train_inputs: test_images.reshape(batch_size,image_size*i
)
accuracy_per_epoch.append(accuracy(test_batch_predictions,test_labels))
summ_writer_4.add_summary(summ, epoch)
session.close()
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
...
...
...
Here you can compare how weights evolve over time for the two different
initalizations; truncated_normal_initializer (red) and xavier_initializer
(blue). You can see that xavier_initializer keeps more weights away from
zero than the normal initializer, which is a better thing to do. This is potentially
allowing the Xavier initialized neural networks to converge faster, as evident by
the loss/accuracy curves.
Distribution View of Histograms
You now can compare the difference between the two views; histogram view
and the distribution view. Distribution view is essentially a different way of
looking at the histograms. If you look at the image below, you can easily see that
the distribution view is a top view of the histogram view. Note that the
histogram graphs are rotated in this case to easily see the resemblance.
Conclusion
In this tutorial, you saw how to use TensorBoard. First, you learned how to start
its service through the command prompt (Windows) or terminal (Ubuntu/Mac).
Next, you looked at different views of data provided by TensorBoard. You then
looked at code that visualizes scalar values (for example loss / accuracy) and
used a feed-forward neural network model to concretely understand the use of
the scalar value visualization.
Finally, you discussed the similarities between the distribution view and the
histogram view.
If you would like to learn more about deep learning, be sure to take a look at our
Deep Learning in Keras course.
If you'd like to get in touch with me, you can drop me an e-mail at
thushv@gmail.com or connect with me via LinkedIn.
11
COMMENTS
Hani Mounla
08/06/2018 03:15 PM
Very nice !
Thushan Ganegedara
10/06/2018 01:46 AM
Thushan Ganegedara
16/06/2018 12:27 PM
Sayak Paul
24/07/2018 11:19 AM
Steven Lei
14/09/2018 09:30 AM
Varsha Waingankar
12/10/2018 06:00 PM
1D Linear Loss Function Plots. One simple and lightweight method to plot loss
function is to choose two sets of parameters θi and θf , and plot the values of the
loss function L(θ) along the line connecting these two sets. We can parameterize
this line by choosing a scalar parameter α and defining the weighted average θα = (1
− α)θi + αθf , and compute the function f(θ) = L(θα). Here we choose θi as the
randomly assigned initial weights and θf the final well-trained model weights in Part
(1) above (global minimum, hopefully).
(b). 2D Loss Contour Plots. In this approach, one first chooses a parameter set θ∗,
which can be the final parameter set θf , to be used as the reference center of the
2D loss plot to be generated, and then chooses two direction vectors,δ and η (with
dimensions compatible with θ). One then plots a function of the form f(α, β) = L(θ∗ +
αδ + βη).
in the (α,β) 2D (surface) space. α and β are real scalars. δ and η could be two
randomly generated vectors (with proper normalization (see Notes below). This
approach was used to explore the trajectories of different minimization methods.
Again, θ∗ could be the final well-trained model weights from Part (1). Similar to Part
2(a), we can show the 2D loss landscape.
kamran khan
29/04/2019 08:51 PM
if any one help me. i want to connect tensor board with mysql database. and i fetch
data from database
James Lin
06/06/2019 01:22 AM
37
Castor Gilbert
37 I don’t think that any country or estate can be called as the capital of world.
Because, every country have a different ukessay.com review services
implementation over there. And it is extremely hard to merge all the countries
together under one rule.
Shahzeb Shahzeb
04/09/2019 01:44 PM
Hello,
Can we add the graph of accuracy of model and validation_set accuracy to analyze
the same points. Is it possible?
Youssef Boudhawia
20/10/2019 11:54 AM
Thank you but many things are not clear fr me.Should i this code after training ?
1
Subscribe to RSS