You are on page 1of 8

3/3/2021 2.

1-a-first-look-at-a-neural-network (2)

In [1]: import keras


import tensorflow as tf
print("keras version==",keras.__version__)
print("tensorflow version==",tf.__version__)

keras version== 2.4.3


tensorflow version== 2.2.0

A first look at a neural network


This notebook contains the code samples found in Chapter 2, Section 1 of Deep Learning with Python
(https://www.manning.com/books/deep-learning-with-python?a_aid=keras&a_bid=76564dff). Note that the
original text features far more content, in particular further explanations and figures: in this notebook, you will
only find source code and related comments.

We will now take a look at a first concrete example of a neural network, which makes use of the Python library
Keras to learn to classify hand-written digits. Unless you already have experience with Keras or similar libraries,
you will not understand everything about this first example right away. You probably haven't even installed Keras
yet. Don't worry, that is perfectly fine. In the next chapter, we will review each element in our example and
explain them in detail. So don't worry if some steps seem arbitrary or look like magic to you! We've got to start
somewhere.

The problem we are trying to solve here is to classify grayscale images of handwritten digits (28 pixels by 28
pixels), into their 10 categories (0 to 9). The dataset we will use is the MNIST dataset, a classic dataset in the
machine learning community, which has been around for almost as long as the field itself and has been very
intensively studied. It's a set of 60,000 training images, plus 10,000 test images, assembled by the National
Institute of Standards and Technology (the NIST in MNIST) in the 1980s. You can think of "solving" MNIST as the
"Hello World" of deep learning -- it's what you do to verify that your algorithms are working as expected. As you
become a machine learning practitioner, you will see MNIST come up over and over again, in scientific papers,
blog posts, and so on.

The MNIST dataset comes pre-loaded in Keras, in the form of a set of four Numpy arrays:

In [11]: from keras.datasets import mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images and train_labels form the "training set", the data that the model will learn from. The model
will then be tested on the "test set", test_images and test_labels . Our images are encoded as Numpy
arrays, and the labels are simply an array of digits, ranging from 0 to 9. There is a one-to-one correspondence
between the images and the labels.

Let's have a look at the training data:

file:///C:/Users/teedaniels/Downloads/2.1-a-first-look-at-a-neural-network (2).html 1/8


3/3/2021 2.1-a-first-look-at-a-neural-network (2)

In [12]: train_images.shape

Out[12]: (60000, 28, 28)

In [13]: len(train_labels)

Out[13]: 60000

In [14]: train_labels

Out[14]: array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

Let's have a look at the test data:

In [15]: test_images.shape

Out[15]: (10000, 28, 28)

In [16]: len(test_labels)

Out[16]: 10000

In [17]: test_labels[0]

Out[17]: 7

Our workflow will be as follow: first we will present our neural network with the training data, train_images and
train_labels . The network will then learn to associate images and labels. Finally, we will ask the network to
produce predictions for test_images , and we will verify if these predictions match the labels from
test_labels .

Let's build our network -- again, remember that you aren't supposed to understand everything about this example
just yet.

In [18]: from keras import models


from keras import layers

network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))

file:///C:/Users/teedaniels/Downloads/2.1-a-first-look-at-a-neural-network (2).html 2/8


3/3/2021 2.1-a-first-look-at-a-neural-network (2)

The core building block of neural networks is the "layer", a data-processing module which you can conceive as a
"filter" for data. Some data comes in, and comes out in a more useful form. Precisely, layers extract
representations out of the data fed into them -- hopefully representations that are more meaningful for the
problem at hand. Most of deep learning really consists of chaining together simple layers which will implement a
form of progressive "data distillation". A deep learning model is like a sieve for data processing, made of a
succession of increasingly refined data filters -- the "layers".

Here our network consists of a sequence of two Dense layers, which are densely-connected (also called "fully-
connected") neural layers. The second (and last) layer is a 10-way "softmax" layer, which means it will return an
array of 10 probability scores (summing to 1). Each score will be the probability that the current digit image
belongs to one of our 10 digit classes.

To make our network ready for training, we need to pick three more things, as part of "compilation" step:

A loss function: the is how the network will be able to measure how good a job it is doing on its training data,
and thus how it will be able to steer itself in the right direction.
An optimizer: this is the mechanism through which the network will update itself based on the data it sees
and its loss function.
Metrics to monitor during training and testing. Here we will only care about accuracy (the fraction of the
images that were correctly classified).

The exact purpose of the loss function and the optimizer will be made clear throughout the next two chapters.

In [19]: network.compile(optimizer='rmsprop',loss='categorical_crossentropy',metrics=[
'accuracy'])
network.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_2 (Dense) (None, 512) 401920
_________________________________________________________________
dense_3 (Dense) (None, 10) 5130
=================================================================
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________

Before training, we will preprocess our data by reshaping it into the shape that the network expects, and scaling
it so that all values are in the [0, 1] interval. Previously, our training images for instance were stored in an
array of shape (60000, 28, 28) of type uint8 with values in the [0, 255] interval. We transform it into a
float32 array of shape (60000, 28 * 28) with values between 0 and 1.

In [20]: train_images = train_images.reshape((60000, 28 * 28))


train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28 * 28))


test_images = test_images.astype('float32') / 255

file:///C:/Users/teedaniels/Downloads/2.1-a-first-look-at-a-neural-network (2).html 3/8


3/3/2021 2.1-a-first-look-at-a-neural-network (2)

We also need to categorically encode the labels, a step which we explain in chapter 3:

In [21]: from keras.utils import to_categorical


print(type(test_labels[0]))
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

<class 'numpy.uint8'>

In [22]: print(test_labels[0])

[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]

We are now ready to train our network, which in Keras is done via a call to the fit method of the network: we
"fit" the model to its training data.

In [36]: x_val = train_images[:10000]


partial_x_train = train_images[10000:]

y_val = train_labels[:10000]
partial_y_train = train_labels[10000:]
print(x_val[2]. shape)

(784,)

In [25]: history =network.fit(partial_x_train,partial_y_train,epochs=1, batch_size=512,


validation_data=(x_val, y_val))

98/98 [==============================] - 51s 516ms/step - loss: 0.4337 - accu


racy: 0.8765 - val_loss: 0.2616 - val_accuracy: 0.9234

Two quantities are being displayed during training: the "loss" of the network over the training data, and the
accuracy of the network over the training data.

We quickly reach an accuracy of 0.989 (i.e. 98.9%) on the training data. Now let's check that our model performs
well on the test set too:

In [ ]:

file:///C:/Users/teedaniels/Downloads/2.1-a-first-look-at-a-neural-network (2).html 4/8


3/3/2021 2.1-a-first-look-at-a-neural-network (2)

In [9]: history_dict = history.history


print(history_dict.keys())
print('\n',history .history)

dict_keys(['val_loss', 'val_accuracy', 'loss', 'accuracy'])

{'val_loss': [0.22377780916690826, 0.15834669768810272, 0.1319906724333763,


0.11759173978567124, 0.09213001968860626, 0.09223344718217849, 0.086931369704
0081, 0.0888035547554493, 0.07904970963001251, 0.08422845186591148], 'val_acc
uracy': [0.9381999969482422, 0.9555000066757202, 0.963100016117096, 0.9641000
032424927, 0.972599983215332, 0.9726999998092651, 0.9747999906539917, 0.97109
99727249146, 0.9771000146865845, 0.9753999710083008], 'loss': [0.431767196021
08, 0.19562156077384948, 0.13527957456111908, 0.10176401487350464, 0.07936429
808974266, 0.06305052213430405, 0.05131933896303177, 0.042019183785915376, 0.
03432257513403893, 0.028164280136823653], 'accuracy': [0.87774, 0.94272, 0.96
116, 0.97002, 0.97738, 0.98196, 0.98546, 0.98818, 0.9902, 0.9928]}

In [27]: network.save('mnist_model.h5')

In [39]: from keras.models import load_model


saved_network = load_model('mnist_model.h5')
print(saved_network.predict(x_val[2:]))
model.summary() # As a reminder.

[[1.50871649e-03 5.39984612e-04 1.05366884e-02 ... 1.46567961e-02


5.88864880e-03 7.35504478e-02]
[3.51807212e-05 9.86623585e-01 1.39605126e-03 ... 3.02218454e-04
9.62954387e-03 1.17907824e-04]
[1.17348009e-05 9.41987091e-05 3.80176175e-06 ... 2.42470880e-03
1.59732159e-03 9.84995425e-01]
...
[8.74071557e-05 4.70288564e-03 2.27864226e-03 ... 2.56597332e-06
1.00377854e-02 7.38216331e-05]
[4.15157774e-05 1.78299932e-04 4.25769249e-05 ... 1.13392656e-03
1.47493305e-02 9.55478251e-01]
[3.39573657e-04 1.57570194e-05 3.99607015e-05 ... 8.79379153e-01
5.39453584e-04 1.18102841e-01]]
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_2 (Dense) (None, 512) 401920
_________________________________________________________________
dense_3 (Dense) (None, 10) 5130
=================================================================
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________

file:///C:/Users/teedaniels/Downloads/2.1-a-first-look-at-a-neural-network (2).html 5/8


3/3/2021 2.1-a-first-look-at-a-neural-network (2)

In [11]: import matplotlib


#matplotlib.use('WebAgg')
matplotlib.use('nbAgg')
import matplotlib.pyplot as plt
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

# "bo" is for "blue dot"


plt.plot(epochs, loss, 'bo', label='Training loss')
# b is for "solid blue line"
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.show()

file:///C:/Users/teedaniels/Downloads/2.1-a-first-look-at-a-neural-network (2).html 6/8


3/3/2021 2.1-a-first-look-at-a-neural-network (2)

In [12]: plt.clf()
# "bo" is for "blue dot"
plt.plot(epochs, acc, 'bo', label='Training Accuracy')
# b is for "solid blue line"
plt.plot(epochs, val_acc, 'b', label='Validation Accuracy')
plt.title('Training and validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

plt.show()

In [13]: test_loss, test_acc = network.evaluate(test_images, test_labels)

10000/10000 [==============================] - 21s 2ms/step

In [22]: print('test_acc:', test_acc)

test_acc: 0.9804999828338623

file:///C:/Users/teedaniels/Downloads/2.1-a-first-look-at-a-neural-network (2).html 7/8


3/3/2021 2.1-a-first-look-at-a-neural-network (2)

Our test set accuracy turns out to be 97.8% -- that's quite a bit lower than the training set accuracy. This gap
between training accuracy and test accuracy is an example of "overfitting", the fact that machine learning models
tend to perform worse on new data than on their training data. Overfitting will be a central topic in chapter 3.

This concludes our very first example -- you just saw how we could build and a train a neural network to classify
handwritten digits, in less than 20 lines of Python code. In the next chapter, we will go in detail over every moving
piece we just previewed, and clarify what is really going on behind the scenes. You will learn about "tensors", the
data-storing objects going into the network, about tensor operations, which layers are made of, and about
gradient descent, which allows our network to learn from its training examples.

file:///C:/Users/teedaniels/Downloads/2.1-a-first-look-at-a-neural-network (2).html 8/8

You might also like