You are on page 1of 9

created by Y.

A
Experiment No. 04
Aim :- To implement Handwritten Digit Recognition.
Theory:-
Handwritten Character Recognition
Handwritten character recognition is a field of research in artificial intelligence, computer vision, and pattern
recognition. A computer performing handwriting recognition is said to be able to acquire and detect
characters in paper documents, pictures, touch-screen devices and other sources and convert them into
machine-encoded form. Its application is found in optical character recognition, transcription of handwritten
documents into digital documents and more advanced intelligent character recognition systems.
Handwritten character recognition can be thought of as a subset of the image recognition problem.

The general flow of an image recognition algorithm.

Basically, the algorithm takes an image (image of a handwritten digit) as an input and outputs the likelihood
that the image belongs to different classes (the machine-encoded digits, 1–9).
The goal is to take an image of a handwritten digit and determine what that digit is. The digits range from
one (1) through nine (9).
We will look into the Support Vector Machines (SVMs) and Nearest Neighbor (NN) techniques to solve the
problem. The tasks involved are the following:
1. Download the MNIST dataset
2. Preprocess the MNIST dataset
3. Train a classifier that can categorize the handwritten digits
4. Apply the model on the test set and report its accuracy
The dataset for this problem will be downloaded from kaggle, which was taken from the famous MNIST
(Modified National Institute of Standards and Technology) dataset.

Metrics
We will be using the accuracy score to quantify the performance of our model. The accuracy will tell us
what percentage of our test data was classified correctly. The accuracy is a good metric choice because it
will be easy to compare our model’s performance to that of the benchmark as it uses the same metric. Also,
our dataset is balanced (equal number of training examples for each label) which makes the accuracy
appropriate for this problem.

Some examples of the dataset.


created by Y.A
Exploratory Visualization

We have counted the number of occurrences of each label in the training set. The figure below illustrates
the distribution of these labels. It is obvious from the figure that the distribution is uniform meaning our
dataset is balanced.

The number of occurrences of each label in the dataset.

We’d also like to know more about average intensity, that is the average value of a pixel in an image for
the different digits. Intuition tells me that the digit “1” will on average have less intensity than say an “8”.

The average intensity of each label in the dataset.

As we can see, there are differences in intensities and our intuition was correct. “8” has a higher intensity
than a “1”. Also, “0” has the highest intensity, even higher than “8” which is surprising. This could be
attributed to the fact that different people write their digits differently. Calculating the standard deviation of
intensities gives a value of 11.08 which shows that there exists some variation in the way the digits are
written.
10/23/23, 10:18 AM SC_LAB_04.ipynb - Colaboratory
created by Y.A
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt

import numpy as np

(x_train, y_train), (x_test,y_test) = keras.datasets.mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


11490434/11490434 [==============================] - 0s 0us/step

len(x_train)

60000

len(x_test)

output 10000

x_train[0].shape

(28, 28)

x_train[0]

array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3,
18, 18, 18, 126, 136, 175, 26, 166, 255, 247, 127, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 30, 36, 94, 154, 170,
253, 253, 253, 253, 253, 225, 172, 253, 242, 195, 64, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 49, 238, 253, 253, 253, 253,
253, 253, 253, 253, 251, 93, 82, 82, 56, 39, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 18, 219, 253, 253, 253, 253,
253, 198, 182, 247, 241, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 80, 156, 107, 253, 253,
205, 11, 0, 43, 154, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 14, 1, 154, 253,
90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 139, 253,
190, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 190,
253, 70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 35,
241, 225, 160, 108, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
81, 240, 253, 253, 119, 25, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 45, 186, 253, 253, 150, 27, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 16, 93, 252, 253, 187, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

https://colab.research.google.com/drive/19OgK5-p_trwMC3ixplDxM4anOBYMwwMd#printMode=true 1/7
10/23/23, 10:18 AM SC_LAB_04.ipynb - Colaboratory
created by Y.A 0, 0, 0, 0, 249, 253, 249, 64, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 46, 130, 183, 253, 253, 207, 2, 0, 0, 0, 0, 0,
0, 0],
[ 0 0 0 0 0 0 0 0 0 0 0 0 39

plt.matshow(x_train[8])

<matplotlib.image.AxesImage at 0x79ed41e5d840>

y_train[2]

y_train[:5]

array([5, 0, 4, 1, 9], dtype=uint8)

x_train = x_train/255
x_test = x_test/255

x_train_flattened = x_train.reshape(len(x_train),28*28)
x_train_flattened.shape

(60000, 784)

x_test_flattened = x_test.reshape(len(x_test),28*28)
x_test_flattened.shape

(10000, 784)

arr = np.around(x_train[0])

x_train_flattened[0]

array([0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,

https://colab.research.google.com/drive/19OgK5-p_trwMC3ixplDxM4anOBYMwwMd#printMode=true 2/7
10/23/23, 10:18 AM SC_LAB_04.ipynb - Colaboratory
created by Y.A0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0.01176471, 0.07058824, 0.07058824,
0.07058824, 0.49411765, 0.53333333, 0.68627451, 0.10196078,
0.65098039, 1. , 0.96862745, 0.49803922, 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0.11764706, 0.14117647, 0.36862745, 0.60392157,
0.66666667, 0.99215686, 0.99215686, 0.99215686, 0.99215686,
0.99215686, 0.88235294, 0.6745098 , 0.99215686, 0.94901961,
0.76470588, 0.25098039, 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0.19215686, 0.93333333,
0.99215686, 0.99215686, 0.99215686, 0.99215686, 0.99215686,
0.99215686, 0.99215686, 0.99215686, 0.98431373, 0.36470588,
0.32156863, 0.32156863, 0.21960784, 0.15294118, 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0.07058824, 0.85882353, 0.99215686, 0.99215686,
0.99215686, 0.99215686, 0.99215686, 0.77647059, 0.71372549,
0.96862745, 0.94509804, 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0.31372549, 0.61176471, 0.41960784, 0.99215686, 0.99215686,
0.80392157, 0.04313725, 0. , 0.16862745, 0.60392157,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0 0 0 0 0 05490196

model = keras.Sequential([
keras.layers.Dense(10, input_shape=(784,), activation = "sigmoid" )
])

model.compile(optimizer = "adam", loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])

model.fit(x_train_flattened, y_train, epochs=5)

Epoch 1/5
1875/1875 [==============================] - 10s 3ms/step - loss: 0.4755 - accuracy: 0.8752
Epoch 2/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.3042 - accuracy: 0.9160
Epoch 3/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2833 - accuracy: 0.9206
Epoch 4/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.2731 - accuracy: 0.9238
Epoch 5/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2669 - accuracy: 0.9252
<keras.src.callbacks.History at 0x79ed3ebbc190>

model.evaluate(x_test_flattened, y_test)

313/313 [==============================] - 1s 2ms/step - loss: 0.2665 - accuracy: 0.9259


[0.2664738595485687, 0.9258999824523926]

plt.matshow(x_test[10])
plt.matshow(x_test[1])
plt.matshow(x_test[2])

https://colab.research.google.com/drive/19OgK5-p_trwMC3ixplDxM4anOBYMwwMd#printMode=true 3/7
10/23/23, 10:18 AM SC_LAB_04.ipynb - Colaboratory
created by Y.A
<matplotlib.image.AxesImage at 0x79ed10740a30>

y_pred = model.predict(x_test_flattened)
y_pred[0]

313/313 [==============================] - 1s 2ms/step


array([2.4330137e-02, 2.8497323e-07, 8.3825558e-02, 9.4403744e-01,
2.3193338e-03, 1.0936240e-01, 6.7458177e-07, 9.9973291e-01,
8.5780226e-02, 6.8041223e-01], dtype=float32)

y_pred_labels = [np.argmax(i) for i in y_pred]


y_pred_labels[:5]

[7, 2, 1, 0, 4]

y_test[:5]

array([7, 2, 1, 0, 4], dtype=uint8)

https://colab.research.google.com/drive/19OgK5-p_trwMC3ixplDxM4anOBYMwwMd#printMode=true 4/7
10/23/23, 10:18 AM SC_LAB_04.ipynb - Colaboratory
created by Y.A
import numpy as np

np.argmax(y_pred[10])

conf_mat = tf.math.confusion_matrix(labels = y_test, predictions = y_pred_labels)


conf_mat

<tf.Tensor: shape=(10, 10), dtype=int32, numpy=


array([[ 963, 0, 2, 2, 0, 5, 5, 2, 1, 0],
[ 0, 1110, 3, 2, 0, 1, 4, 2, 13, 0],
[ 3, 9, 933, 15, 8, 4, 10, 9, 37, 4],
[ 3, 0, 21, 916, 0, 25, 2, 11, 22, 10],
[ 2, 1, 8, 1, 905, 0, 7, 4, 10, 44],
[ 10, 3, 7, 26, 7, 793, 9, 5, 25, 7],
[ 13, 3, 12, 1, 8, 17, 899, 2, 3, 0],
[ 1, 6, 26, 5, 7, 1, 0, 939, 3, 40],
[ 5, 7, 6, 19, 9, 31, 8, 9, 866, 14],
[ 10, 7, 1, 9, 20, 5, 0, 15, 7, 935]],
dtype=int32)>

import seaborn as sns


plt.figure(figsize =(10,8))
sns.heatmap(conf_mat, annot=True)
plt.xlabel("predicted")
plt.ylabel("Truth")

Text(95.72222222222221, 0.5, 'Truth')

model = keras.Sequential([
keras.layers.Dense(100, input_shape =(784,), activation = "relu"),
keras.layers.Dense(10, activation = "sigmoid")
])

model.compile(optimizer = "adam", loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])

model.fit(x_train_flattened, y_train, epochs=5)

https://colab.research.google.com/drive/19OgK5-p_trwMC3ixplDxM4anOBYMwwMd#printMode=true 5/7
10/23/23, 10:18 AM SC_LAB_04.ipynb - Colaboratory
created by Y.A
Epoch 1/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.2769 - accuracy: 0.9211
Epoch 2/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.1253 - accuracy: 0.9637
Epoch 3/5
1875/1875 [==============================] - 5s 2ms/step - loss: 0.0869 - accuracy: 0.9742
Epoch 4/5
1875/1875 [==============================] - 5s 2ms/step - loss: 0.0652 - accuracy: 0.9808
Epoch 5/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0520 - accuracy: 0.9840
<keras.src.callbacks.History at 0x79ed422c55d0>

model.evaluate(x_test_flattened, y_test)

313/313 [==============================] - 1s 2ms/step - loss: 0.0784 - accuracy: 0.9761


[0.07841683179140091, 0.9761000275611877]

conf_mat = tf.math.confusion_matrix(labels = y_test, predictions = y_pred_labels)


conf_mat

<tf.Tensor: shape=(10, 10), dtype=int32, numpy=


array([[ 963, 0, 2, 2, 0, 5, 5, 2, 1, 0],
[ 0, 1110, 3, 2, 0, 1, 4, 2, 13, 0],
[ 3, 9, 933, 15, 8, 4, 10, 9, 37, 4],
[ 3, 0, 21, 916, 0, 25, 2, 11, 22, 10],
[ 2, 1, 8, 1, 905, 0, 7, 4, 10, 44],
[ 10, 3, 7, 26, 7, 793, 9, 5, 25, 7],
[ 13, 3, 12, 1, 8, 17, 899, 2, 3, 0],
[ 1, 6, 26, 5, 7, 1, 0, 939, 3, 40],
[ 5, 7, 6, 19, 9, 31, 8, 9, 866, 14],
[ 10, 7, 1, 9, 20, 5, 0, 15, 7, 935]],
dtype=int32)>

import seaborn as sns


plt.figure(figsize =(10,8))
sns.heatmap(conf_mat, annot=True)
plt.xlabel("predicted")
plt.ylabel("Truth")

Text(95.72222222222221, 0.5, 'Truth')

https://colab.research.google.com/drive/19OgK5-p_trwMC3ixplDxM4anOBYMwwMd#printMode=true 6/7
10/23/23, 10:18 AM SC_LAB_04.ipynb - Colaboratory
created by Y.A
model = keras.Sequential([
keras.layers.Flatten(input_shape = (28,28)),
keras.layers.Dense(100, input_shape =(784,), activation = "relu"),
keras.layers.Dense(10, activation = "sigmoid")
])

model.compile(optimizer = "adam", loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])

model.fit(x_train, y_train, epochs=5)

Epoch 1/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.2678 - accuracy: 0.9242
Epoch 2/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.1173 - accuracy: 0.9651
Epoch 3/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0819 - accuracy: 0.9754
Epoch 4/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0619 - accuracy: 0.9813
Epoch 5/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0502 - accuracy: 0.9846
<keras.src.callbacks.History at 0x79ec8392be20>

https://colab.research.google.com/drive/19OgK5-p_trwMC3ixplDxM4anOBYMwwMd#printMode=true 7/7

You might also like