You are on page 1of 21

Analogy Between CNN and RNN

Using MNIST Dataset

Under the supervision of

Prof. Rathi R

Assistant Professor Sr. Grade 1

School of Information Technology and Engineering

Submitted By:
Shreya Mandelia 20MCA0225
Ananya Gupta 20MCA0061
Kritika Saini 20MCA0022

Swarnima Singh 20MCA0201

Ananya Sinha 20MCA0210


REVIEW 1
ABSTRACT

The present era is witnessing the biggest migration of approaches towards prudent analysis of data.
Machine Learning (ML) provides us with plethora of approaches for effective investigation of
complex problems. Deep Neural Networks (DNN) have become a tool of choice for ML
practitioners today. Convolution Neural Network (CNN) and Recurrent Neural Network (RNN)
are most popular DNN of recent times. CNN is a specialized neural network for processing data
that has an input shape like a 2D matrix like images. CNNs are typically used for image detection
and classification. RNNs are a class of neural networks that are helpful in modeling sequence data.
RNNs are designed to recognize a data's sequential characteristics and use patterns to predict the
next likely scenario. The difference of approaches of CNN and RNN to process the data sparks the
research interest to analyze the two in terms of efficiency and accuracy. In this respect, the article
presents a survey on CNN and RNN and also draws inference on the better approach between them
using the MNIST dataset.

INTRODUCTION
In the current age of digitization, handwriting recognition plays an important role in information
processing. Nowadays, more and more people use images to represent and transmit information. It
is also popular to extract important information from images. Image recognition is an important
research area for its widely applications. In the relatively young field of computer pattern
recognition, one of the challenging tasks is the accurate automated recognition of human
handwriting. Indeed, this is precisely a challenging problem because there is a considerable variety
in handwriting from person to person. Although, this variance does not cause any problems to
humans, however it is more difficult to teach computers to recognize general handwriting. For the
image recognition problem such as handwritten classification, it is very important to make out how
data are represented in images. The data here is not the row pixels, but should be the features of
images which have high level representation. For the problem of handwritten digit recognition, the
digit’s structure features should be first extracted from the strokes. Then the extracted features can
be used to recognize the handwritten digit. The high performance of large-scale data processing
ability is the core technology in the era of big data. Most current classification and regression
machine learning methods are shallow learning algorithms. It is difficult to represent complex
function effectively, and its generalization ability is limited for complex classification problems.
Deep learning is a multilayer neural network learning algorithm which emerged in recent years.
Applications of deep learning to various problems have been the subject of a number of recent
studies ranging from image classification and speech recognition to audio classification. It has
brought a new wave to machine learning, and making artificial intelligence and human-computer
interaction advance with big strides. Deep Learning algorithms are highly efficient in image
recognition tasks such as MNIST digit recognition. In this paper, we apply deep learning
algorithms to handwritten MNIST dataset, and explore the mainstream algorithms of deep
learning; the Convolution Neural Network (CNN) and the Recurrent Neural Network (RNN). Also,
the comparison is drawn between the two neural networks in terms of the approaches and their
effective accuracy and efficiency to predict the digits.
This article refers to a model for modeling and simulation of a CNN and RNN to recognize
handwritten digits from the MNIST database. The mathematical model of this neural network
algorithm is implemented in python with numpy and tensorflow.
REVIEW 2
LITERATURE SURVEY

1. Meshaal. Mouawad analyzed the MNIST dataset for handwritten digit recognition. For the
same dataset, two different networks were used. In the first network, a 3-layer MLP with
ReLU and dropout was used after each layer. The training process was fast in these network
settings. An overall accuracy of 95% during training and 94% for testing were recorded. In
Network II a stack of CNN, ReLU, and Maxpooling was used. The training process was a
little bit slower, but it gave better accuracy than the Network I. They got 99% overall
accuracy for training, and 98.9% for testing. They also modified the Network II to get better
accuracy. They have added more filter, reduced the batch size, and added a 30% dropout.
The modified Network II gave 99.82% overall accuracy during the training and 99.25%
accuracy for testing. [1]

2. Ayush Kumar Agrawal, Vineet Kumar Awasthi, Kranti Kumar Dewangan, Deepesh
Dewangan and Sameera Khan developed a CNN model for handwritten digit recognition
using 4 convolutional layers. They used 32 filters with the size 3x3 in input layers and seven
different optimizers to extract the optimal features and obtained the accuracy of 99.60% on
standard data. [2]

3. Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende and Daan Wierstra
introduced the Deep Recurrent Attentive Writer (DRAW) neural network architecture, and
demonstrated its ability to generate highly realistic natural images such as photographs of
house numbers, as well as improving on the best-known results for binarized MNIST
generation. They established the two-dimensional differentiable attention mechanism
embedded in DRAW is beneficial not only to image generation, but also to image
classification.[3]
4. Akmaljon Palvanov and Young Im Cho build a Java Based GUI application for reaching to
very high accuracy, even using smaller inputs, in CapsNet model. Also, it was faster in real-
time application and recognizes handwritten digits considerable fast (within 1-2 seconds)
comparing to other models. It was the most suitable model for real-time application as other
three different models delay during evaluation and cannot reach as high accuracy as CapsNet
model. Although memory usage of the trained model was more than CNN and regression
model it predicts results better than both two models. Also, based on their experiments,
CapsNet require less time to train and less time to save the model in contrast ResNet.[4]

5. Feiyang Chen, Nan Chen, Hanyang Mao and Hanlin Hu’s goal was to discover which model
perform better across divided MNIST datasets. They compared four models on MNIST
dataset with different division, and showed that CapsNet perform best across datasets.
Additionally, they also observed that CapsNet required only a small amount of data to
achieve excellent performance.[5]
6. Mohammed Salemdeeb and Sarp Ertürk focused on deep learning technique of CNNs to
recognize multi-language LP characters for both Latin and Arabic characters used in vehicle
LPs. A new approach was proposed, analyzed and tested on Latin and Arabic CR benchmarks
for both LP and handwritten characters recognition. The proposed approach consists of
proposing FDCNN architecture, FDCNN parameter selection and training process. The
proposed full depth and width selection ideas were very efficient in extracting features from
tiny grayscale images. The complexity of FDCNN was also analyzed in terms of number of
learnable parameters and feature maps memory usage. The full depth concept of reducing the
feature maps size to one neuron has decreased the total number of learnable parameters while
achieving very good results. Implementation of FDCNN approach was simple and can be
used in real time applications worked on small devices like mobiles, tablets and some
embedded systems. Very promising results were achieved on some common benchmarks like
MNIST, FashionMNIST, MADbase, AIA9K, AHCD, Zemris, ReId, UFPR and the newly
introduced LPALIC dataset. [6]

7. Satrya Budi Pratama proposed the preprocessing method of SVD with lower-rank
approximation on MNIST dataset using LeNet-5 improvement model. Based on experiment,
the SVD improved the performance of the model with the 10 number of components and
gave 99.03% accuracy. Also, the compression ratio with 2 number of components can reduce
about 7.87 times of the file size. [7]
8. Savita Ahlawat, Amit Choudhary, Anand Nayyar, Saurabh Singh and Byungun Yoon with
the aim of improving the performance of handwritten digit recognition, they evaluated
variants of a convolutional neural network to avoid complex pre-processing, costly feature
extraction and a complex ensemble (classifier combination) approach of a traditional
recognition system. Through extensive evaluation using a MNIST dataset, their present work
suggests the role of various hyper- parameters. They also verified that fine tuning of hyper-
parameters was essential in improving the performance of CNN architecture. They achieved
a recognition rate of 99.89% with the Adam optimizer for the MNIST database, which was
better than all previously reported results.[8]

9. Vinay Uday Prabhu described the creation of a new handwritten digit’s dataset for the
Kannada language, which they termed as Kannada-MNIST dataset. They have duly opened
source all aspects of the dataset creation including the raw scan images, the specific brand of
paper used, the exact scanner model used, the signal processing script used to slice and extract
the individual digits and the CNN models used to obtain the baseline accuracies. They were
able to attain 97% top- 1 accuracy when they trained and tested on what they termed as the
main dataset with 60000 28 × 28 gray-scale training images and 10000 test images. They also
achieved a top-1 accuracy of 77% when they trained on the 60000 main dataset and tested
on 10240 28 × 28 gray-scale test images from what they termed as the Dig-MNIST dataset.
The images in the Dig-MNSIT dataset were noisier with smudges and grid borders sneaking
in during the grid-image segmentation phase.[9]
10. Ritik Dixit, Rishika Kushwah and Samay Pashine implemented three models for handwritten
digit recognition using MNIST datasets, based on deep and machine learning algorithms.
They compared them based on their characteristics to appraise the most accurate model
among them. Support vector machines are one of the basic classifiers that’s why it’s faster
than most algorithms and in this case, gives the maximum training accuracy rate but due to
its simplicity, it’s not possible to classify complex and ambiguous images as accurately as
achieved with MLP and CNN algorithms. They found that CNN gave the most accurate
results for handwritten digit recognition. Next, by comparing execution time of the
algorithms they concluded that increasing the number of epochs without changing the
configuration of the algorithm was useless because of the limitation of a certain model and
they noticed that after a certain number of epochs the model starts overfitting the dataset and
gave the biased prediction. [10]

CONVOLUTIONAL NEURAL NETWORK

A convolutional neural network (CNN) is basically a variation of a multi-layer perceptron (MLP)


network and was used for the first time in 1980. The computing in CNN is inspired by the human
brain. Humans perceive or identify objects visually. We (humans) train our children to recognize
objects by showing him/her hundreds of pictures of that object. This helps a child identify or
make a prediction about objects he/she has never seen before. A CNN works in the same fashion
and is popular for analyzing visual imagery. A CNN integrates the feature extraction and
classification steps and requires minimal pre-processing and feature extraction efforts. A CNN
can extract affluent and interrelated features automatically from images. Moreover, a CNN can
provide considerable recognition accuracy even if there is only a little training data available.

A basic convolutional neural network comprises three components, namely, the convolutional
layer, the pooling layer and the output layer. The pooling layer is optional sometimes. The typical
convolutional neural network architecture with three convolutional layers is well adapted for the
classification of handwritten images as shown in Figure 1. It consists of the input layer, multiple
hidden layers (repetitions of convolutional, normalization, pooling) and a fully connected and an
output layer. output layer. Neurons in one layer connect with some of the neurons present in the
next layer, making the scaling easier for the higher resolution images. The operation of pooling or
sub-sampling can be used to reduce the dimensions of the input. In a CNN model, the input image
is considered as a collection of small sub-regions called the “receptive fields”. A mathematical
operation of the convolution is applied on the input layer, which emulates the response to the next
layer. The response is basically stimulus.
Figure 1. Typical convolutional neural network architecture.

A convolutional layer includes a series of filters that need to be taught about the parameters. The
filter height and weight are less than the size of the input. To compute an activation map made of
neurons, each filter is transformed with the volume of inputs. In computer vision, one of the
difficulties is that images can be very large and thus computationally costly to work on. For
practical uses, we need quicker and computationally cheaper algorithms. A simple neural network
that is completely connected will not help.

Figure 2. CNN model for the MNIST digit classification.[1]

RECURRENT NEURAL NETWORK


A recurrent Neural Network (RNN) is an algorithm for sequential data that comes under neural
networks which is a series of algorithms that endeavors to recognize underlying relationships in a
set of data through a process that mimics the way the human brain operates. Some of the real-life
use cases of RNN includes the autocorrecting feature of google, Apple's Siri, Google’s voice
search, Named Entity Recognition, Sentiment Analysis. All of these problems are called sequence
modelling problems because the sequence is important in these cases for example in English if you
say what is your name makes sense only if it's in a sequence. It is the first algorithm that remembers
its input, due to an internal memory, which makes it perfectly suited for machine learning problems
that involve sequential data. It is one of the algorithms behind the scenes of the amazing
achievements seen in deep learning over the past few years.

RNN’s can remember important things about the input they received, Because of their internal
memory, which allows them to be very precise in predicting what’s coming next. This is why
they're the preferred algorithm for sequential data like time series, speech, text, financial data,
audio, video, weather, DNA sequence. In an RNN the information cycles through a loop. When it
makes a decision, it considers the current input and also what it has learned from the inputs it
received previously. A usual RNN has a short-term memory. In combination with an LSTM, they
also have long-term memory. A recurrent neural network, however, is able to remember those
characters because of its internal memory. It produces output, copies that output, and loops it back
into the network.

Figure 3. Block diagram of a diagonal recurrent neural network.

There are mainly two issues with RNN the first one is Exploding gradients when the algorithm,
without much reason, assigns a stupidly high importance to the weights and the second one is
Vanishing gradients that occur when the values of a gradient are too small and the model stops
learning or takes way too long as a result. Both of which has been solved by truncating or squashing
the gradients and through the concept of LSTM by Sepp Hoch Reiter and Juergen Schmid Huber
respectively.

Figure 4. Block diagram of a simple recurrent neural network.

DATASET DESCRIPTION

The MNIST database, an extension of the NIST database, is a low-complexity data collection of
handwritten digits used to train and test various supervised machine learning algorithms. The
database contains 70,000 28x28 black and white images representing the digits zero through nine.
The data is split into two subsets, with 60,000 images belonging to the training set and 10,000
images belonging to the testing set. The separation of images ensures that given what an adequately
trained model has learned previously, it can accurately classify relevant images not previously
examined.
FINAL REPORT
ABSTRACT

The present era is witnessing the biggest migration of approaches towards prudent analysis of data.
Machine Learning (ML) provides us with plethora of approaches for effective investigation of
complex problems. Deep Neural Networks (DNN) have become a tool of choice for ML
practitioners today. Convolution Neural Network (CNN) and Recurrent Neural Network (RNN)
are most popular DNN of recent times. CNN is a specialized neural network for processing data
that has an input shape like a 2D matrix like images. CNNs are typically used for image detection
and classification. RNNs are a class of neural networks that are helpful in modeling sequence data.
RNNs are designed to recognize a data's sequential characteristics and use patterns to predict the
next likely scenario. The difference of approaches of CNN and RNN to process the data sparks the
research interest to analyze the two in terms of efficiency and accuracy. In this respect, the article
presents a survey on CNN and RNN and also draws inference on the better approach between them
using the MNIST dataset.

INTRODUCTION
In the current age of digitization, handwriting recognition plays an important role in information
processing. Nowadays, more and more people use images to represent and transmit information. It
is also popular to extract important information from images. Image recognition is an important
research area for its widely applications. In the relatively young field of computer pattern
recognition, one of the challenging tasks is the accurate automated recognition of human
handwriting. Indeed, this is precisely a challenging problem because there is a considerable variety
in handwriting from person to person. Although, this variance does not cause any problems to
humans, however it is more difficult to teach computers to recognize general handwriting. For the
image recognition problem such as handwritten classification, it is very important to make out how
data are represented in images. The data here is not the row pixels, but should be the features of
images which have high level representation. For the problem of handwritten digit recognition, the
digit’s structure features should be first extracted from the strokes. Then the extracted features can
be used to recognize the handwritten digit. The high performance of large-scale data processing
ability is the core technology in the era of big data. Most current classification and regression
machine learning methods are shallow learning algorithms. It is difficult to represent complex
function effectively, and its generalization ability is limited for complex classification problems.
Deep learning is a multilayer neural network learning algorithm which emerged in recent years.
Applications of deep learning to various problems have been the subject of a number of recent
studies ranging from image classification and speech recognition to audio classification. It has
brought a new wave to machine learning, and making artificial intelligence and human-computer
interaction advance with big strides. Deep Learning algorithms are highly efficient in image
recognition tasks such as MNIST digit recognition. In this paper, we apply deep learning
algorithms to handwritten MNIST dataset, and explore the mainstream algorithms of deep
learning; the Convolution Neural Network (CNN) and the Recurrent Neural Network (RNN). Also
the comparison is drawn between the two neural networks in terms of the approaches and their
effective accuracy and efficiency to predict the digits.
This article refers to a model for modeling and simulation of a CNN and RNN to recognize
handwritten digits from the MNIST database. The mathematical model of this neural network
algorithm is implemented in python with numpy and tensorflow.

LITERATURE SURVEY

11. Meshaal. Mouawad analyzed the MNIST dataset for handwritten digit recognition. For the
same dataset, two different networks were used. In the first network, a 3-layer MLP with
ReLU and dropout was used after each layer. The training process was fast in these network
settings. An overall accuracy of 95% during training and 94% for testing were recorded. In
Network II a stack of CNN, ReLU, and Maxpooling was used. The training process was a
little bit slower, but it gave better accuracy than the Network I. They got 99% overall
accuracy for training, and 98.9% for testing. They also modified the Network II to get better
accuracy. They have added more filter, reduced the batch size, and added a 30% dropout.
The modified Network II gave 99.82% overall accuracy during the training and 99.25%
accuracy for testing. [1]

12. Ayush Kumar Agrawal, Vineet Kumar Awasthi, Kranti Kumar Dewangan, Deepesh
Dewangan and Sameera Khan developed a CNN model for handwritten digit recognition
using 4 convolutional layers. They used 32 filters with the size 3x3 in input layers and seven
different optimizers to extract the optimal features and obtained the accuracy of 99.60% on
standard data. [2]

13. Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende and Daan Wierstra
introduced the Deep Recurrent Attentive Writer (DRAW) neural network architecture, and
demonstrated its ability to generate highly realistic natural images such as photographs of
house numbers, as well as improving on the best-known results for binarized MNIST
generation. They established the two-dimensional differentiable attention mechanism
embedded in DRAW is beneficial not only to image generation, but also to image
classification.[3]
14. Akmaljon Palvanov and Young Im Cho build a Java Based GUI application for reaching to
very high accuracy, even using smaller inputs, in CapsNet model. Also, it was faster in real-
time application and recognizes handwritten digits considerable fast (within 1-2 seconds)
comparing to other models. It was the most suitable model for real-time application as other
three different models delay during evaluation and cannot reach as high accuracy as CapsNet
model. Although memory usage of the trained model was more than CNN and regression
model it predicts results better than both two models. Also, based on their experiments,
CapsNet require less time to train and less time to save the model in contrast ResNet.[4]
15. Feiyang Chen, Nan Chen, Hanyang Mao and Hanlin Hu’s goal was to discover which model
perform better across divided MNIST datasets. They compared four models on MNIST
dataset with different division, and showed that CapsNet perform best across datasets.
Additionally, they also observed that CapsNet required only a small amount of data to
achieve excellent performance.[5]

16. Mohammed Salemdeeb and Sarp Ertürk focused on deep learning technique of CNNs to
recognize multi-language LP characters for both Latin and Arabic characters used in vehicle
LPs. A new approach was proposed, analyzed and tested on Latin and Arabic CR benchmarks
for both LP and handwritten characters recognition. The proposed approach consists of
proposing FDCNN architecture, FDCNN parameter selection and training process. The
proposed full depth and width selection ideas were very efficient in extracting features from
tiny grayscale images. The complexity of FDCNN was also analyzed in terms of number of
learnable parameters and feature maps memory usage. The full depth concept of reducing the
feature maps size to one neuron has decreased the total number of learnable parameters while
achieving very good results. Implementation of FDCNN approach was simple and can be
used in real time applications worked on small devices like mobiles, tablets and some
embedded systems. Very promising results were achieved on some common benchmarks like
MNIST, FashionMNIST, MADbase, AIA9K, AHCD, Zemris, ReId, UFPR and the newly
introduced LPALIC dataset. [6]

17. Satrya Budi Pratama proposed the preprocessing method of SVD with lower-rank
approximation on MNIST dataset using LeNet-5 improvement model. Based on experiment,
the SVD improved the performance of the model with the 10 number of components and
gave 99.03% accuracy. Also, the compression ratio with 2 number of components can reduce
about 7.87 times of the file size. [7]
18. Savita Ahlawat, Amit Choudhary, Anand Nayyar, Saurabh Singh and Byungun Yoon with
the aim of improving the performance of handwritten digit recognition, they evaluated
variants of a convolutional neural network to avoid complex pre-processing, costly feature
extraction and a complex ensemble (classifier combination) approach of a traditional
recognition system. Through extensive evaluation using a MNIST dataset, their present work
suggests the role of various hyper- parameters. They also verified that fine tuning of hyper-
parameters was essential in improving the performance of CNN architecture. They achieved
a recognition rate of 99.89% with the Adam optimizer for the MNIST database, which was
better than all previously reported results.[8]

19. Vinay Uday Prabhu described the creation of a new handwritten digit’s dataset for the
Kannada language, which they termed as Kannada-MNIST dataset. They have duly opened
source all aspects of the dataset creation including the raw scan images, the specific brand of
paper used, the exact scanner model used, the signal processing script used to slice and extract
the individual digits and the CNN models used to obtain the baseline accuracies. They were
able to attain 97% top- 1 accuracy when they trained and tested on what they termed as the
main dataset with 60000 28 × 28 gray-scale training images and 10000 test images. They also
achieved a top-1 accuracy of 77% when they trained on the 60000 main dataset and tested
on 10240 28 × 28 gray-scale test images from what they termed as the Dig-MNIST dataset.
The images in the Dig-MNSIT dataset were noisier with smudges and grid borders sneaking
in during the grid-image segmentation phase.[9]

20. Ritik Dixit, Rishika Kushwah and Samay Pashine implemented three models for handwritten
digit recognition using MNIST datasets, based on deep and machine learning algorithms.
They compared them based on their characteristics to appraise the most accurate model
among them. Support vector machines are one of the basic classifiers that’s why it’s faster
than most algorithms and in this case, gives the maximum training accuracy rate but due to
its simplicity, it’s not possible to classify complex and ambiguous images as accurately as
achieved with MLP and CNN algorithms. They found that CNN gave the most accurate
results for handwritten digit recognition. Next, by comparing execution time of the
algorithms they concluded that increasing the number of epochs without changing the
configuration of the algorithm was useless because of the limitation of a certain model and
they noticed that after a certain number of epochs the model starts overfitting the dataset and
gave the biased prediction. [10]

CONVOLUTIONAL NEURAL NETWORK

A convolutional neural network (CNN) is basically a variation of a multi-layer perceptron (MLP)


network and was used for the first time in 1980. The computing in CNN is inspired by the human
brain. Humans perceive or identify objects visually. We (humans) train our children to recognize
objects by showing him/her hundreds of pictures of that object. This helps a child identify or
make a prediction about objects he/she has never seen before. A CNN works in the same fashion
and is popular for analyzing visual imagery. A CNN integrates the feature extraction and
classification steps and requires minimal pre-processing and feature extraction efforts. A CNN
can extract affluent and interrelated features automatically from images. Moreover, a CNN can
provide considerable recognition accuracy even if there is only a little training data available.

A basic convolutional neural network comprises three components, namely, the convolutional
layer, the pooling layer and the output layer. The pooling layer is optional sometimes. The typical
convolutional neural network architecture with three convolutional layers is well adapted for the
classification of handwritten images as shown in Figure 1. It consists of the input layer, multiple
hidden layers (repetitions of convolutional, normalization, pooling) and a fully connected and an
output layer. output layer. Neurons in one layer connect with some of the neurons present in the
next layer, making the scaling easier for the higher resolution images. The operation of pooling or
sub-sampling can be used to reduce the dimensions of the input. In a CNN model, the input image
is considered as a collection of small sub-regions called the “receptive fields”. A mathematical
operation of the convolution is applied on the input layer, which emulates the response to the next
layer. The response is basically stimulus.
Figure 1. Typical convolutional neural network architecture.

A convolutional layer includes a series of filters that need to be taught about the parameters. The
filter height and weight are less than the size of the input. To compute an activation map made of
neurons, each filter is transformed with the volume of inputs. In computer vision, one of the
difficulties is that images can be very large and thus computationally costly to work on. For
practical uses, we need quicker and computationally cheaper algorithms. A simple neural network
that is completely connected will not help.

Figure 2. CNN model for the MNIST digit classification.[1]

RECURRENT NEURAL NETWORK


A recurrent Neural Network (RNN) is an algorithm for sequential data that comes under neural
networks which is a series of algorithms that endeavors to recognize underlying relationships in a
set of data through a process that mimics the way the human brain operates. Some of the real-life
use cases of RNN includes the autocorrecting feature of google, Apple's Siri, Google’s voice
search, Named Entity Recognition, Sentiment Analysis. All of these problems are called sequence
modelling problems because the sequence is important in these cases for example in English if you
say what is your name makes sense only if it's in a sequence. It is the first algorithm that remembers
its input, due to an internal memory, which makes it perfectly suited for machine learning problems
that involve sequential data. It is one of the algorithms behind the scenes of the amazing
achievements seen in deep learning over the past few years.

RNN’s can remember important things about the input they received, Because of their internal
memory, which allows them to be very precise in predicting what’s coming next. This is why
they're the preferred algorithm for sequential data like time series, speech, text, financial data,
audio, video, weather, DNA sequence. In an RNN the information cycles through a loop. When it
makes a decision, it considers the current input and also what it has learned from the inputs it
received previously. A usual RNN has a short-term memory. In combination with an LSTM, they
also have long-term memory. A recurrent neural network, however, is able to remember those
characters because of its internal memory. It produces output, copies that output, and loops it back
into the network.

Figure 3. Block diagram of a diagonal recurrent neural network.

There are mainly two issues with RNN the first one is Exploding gradients when the algorithm,
without much reason, assigns a stupidly high importance to the weights and the second one is
Vanishing gradients that occur when the values of a gradient are too small and the model stops
learning or takes way too long as a result. Both of which has been solved by truncating or squashing
the gradients and through the concept of LSTM by Sepp Hoch Reiter and Juergen Schmid Huber
respectively.

Figure 4. Block diagram of a simple recurrent neural network.


DATASET DESCRIPTION
The MNIST database, an extension of the NIST database, is a low-complexity data collection of
handwritten digits used to train and test various supervised machine learning algorithms. The
database contains 70,000 28x28 black and white images representing the digits zero through nine.
The data is split into two subsets, with 60,000 images belonging to the training set and 10,000
images belonging to the testing set. The separation of images ensures that given what an adequately
trained model has learned previously, it can accurately classify relevant images not previously
examined.

PROPOSED WORK
In recent years, machine learning models have been grown significantly and still escalating.
Especially, deep neural networks have achieved great success in various applications, particularly
in tasks involving visual information. There have introduced many state-of-the-art models in the
field that perform dissimilar tasks with a high accuracy and very effectively. In this paper, we are
exploring the model for modeling and simulation of a CNN and RNN to recognize handwritten
digits from the MNIST database along with the implementation of mathematical model in python
with numpy and TensorFlow in order to find their effective accuracy and efficiency. TensorFlow
is an end-to-end, open-source machine learning platform which efficiently executes low-level
tensor operations on CPU, GPU, or TPU whereas keras is simple, flexible and powerful deep
learning API written in Python, running on top of the machine learning platform TensorFlow with
a focus on enabling fast experimentation. Both frameworks thus provide high-level APIs for
building and training models with ease.

Implementing CNN:

The Convolutional Neural Network gained popularity through its use with image data, and is
currently the state of the art for detecting what an image is, or what is contained in the image.
Basic Structure of CNN: Convolution -> Pooling -> Convolution -> Pooling -> Fully Connected
Layer -> Output
Steps:

1. Loading MNIST dataset.


2. Converting attributes from sparse label to categorial.
3. Reshaping and normalization of input images.
4. Setting network parameters.
5. Creating convolutional base for left and right branch of Y network.
6. Merging left and right branches of outputs.
7. Feeding the last output from the convolutional base into one or more Dense layers.
8. Building the model in functional API.
9. Verifying model using graph and layer text description.
10. Training model with input images and labels with 20 EPOCHS.
11. Testing the accuracy and loss on dataset.

Implementing RNN:
Recurrent neural networks (RNN) are a class of neural networks that is powerful for modeling
sequence data such as time series or natural language.
Schematically, a RNN layer uses a for loop to iterate over the timesteps of a sequence, while
maintaining an internal state that encodes information about the timesteps it has seen so far.
Basic structure of RNN: Input -> Hidden Layer -> Output
RNN have a “memory” which remembers all information about what has been calculated. It uses
the same parameters for each input as it performs the same task on all the inputs or hidden layers
to produce the output. This reduces the complexity of parameters, unlike other neural networks.
Steps:

1. Loading MNIST dataset.


2. Computing total number of labels.
3. Converting attributes into one-hot vector.
4. Resizing and normalization of input images.
5. Setting network parameters and model.
6. Creating loss function for one-hot vector using Adam optimizer and accuracy as metrics.
7. Training the network with 20 EPOCHS.
8. Testing the accuracy and loss on dataset.

EXPERIMENTAL ANALYSIS

1. Machine Intelligence Tools


Keras was used to implement the CNN and RNN algorithms in this experiment, along with the
Google Tensorflow (version 2.1.0) in the backend with the help of other scientific computing
libraries: matplotlib and numpy.

2. Observation

2.1 CNN

The kernel size of three convolution layers is used to classify the MNIST dataset. The dataset
is convoluted into 2D form using ReLU activation function and zero-padding. The filters used
while convoluting the layers is of size 32 which doubles every layer. Dropout and Max-pooling
is performed after convolution.The feature maps is flattened to provide features to the classifier.
Finally,the softmax activation function is applied to the penultimate layer. Further
the categorical cross-entropy loss function will be optimized using adam optimizers,and the
classification accuracy metric is evaluated using 20 epochs. The model architecture is
illustrated in the following figure.
Figure 5: Observation of CNN
2.2 RNN

The recurrent neural networks process one input at a time, the weight and shared bias
parameters are same and a dependent on one another. The architecture of the model has input
number as 28, number of steps as 28, number of hidden neurons as 128. The dataset is trained
using ReLU activation function followed by dropout layers. Finally,the softmax activation
function is applied to the penultimate layer. Further the sparse categorical cross-entropy loss
function is applied to the model which is optimized using Adam optimizer where the learning
rate is 1e-3 and decay rate is 1e-5, and the classification accuracy metric is evaluated using 20
epochs.The model architecture is illustrated in the following figure.
Figure 6: Observation of RNN

3. Result

3.1 CNN

On training the dataset for 20 epoch cycles we get the test accuracy of 99.34% and test loss of
0.03%. The test accuracy and test loss as compared to the training accuracy and training loss
over the 20 epoch cycles is depicted in the figure below.
3.2 RNN

On training the dataset for 20 epoch cycles we get the test accuracy of 98.31% and test loss of
0.06%. The test accuracy and test loss as compared to the training accuracy and training loss
over the 20 epoch cycles is depicted in the figure below.

COMPARISON OF THE PROPOSED WORK WITH OTHER PAPERS

PAPERS MODEL RESULTS


Agrawal, A. K. (2021). Design Of CNN They implement seven different
CNN Based Model for optimizers to extract the optimal
Handwritten Digit Recognition features and obtained the accuracy
Using Different Optimizer of 99.60% on standard data.
Techniques.
Pratama, S.B (2021) SVD CNN They tested the model with data test
Implementation on MNIST Image without SVD and with SVD. The
Classification Based on CNN. model without SVD delivered better
accuracy of 98.99%.
Ahlawat, S., Choudhary, A., CNN 3 Layers CNN architecture with three layers
Nayyar, A., Singh, S., & Yoon, B. CNN 4 Layers delivered better recognition
(2020). Improved handwritten accuracy
digit recognition using of 99.89% with the Adam optimizer.
convolutional neural networks
(CNN)
Proposed Work CNN We use two models CNN and RNN.
RNN In CNN we get the accuracy of
99.34% meanwhile in RNN we get
the accuracy of 98.31%.

CONCLUSION

This paper provides the comprehensive study on Convolution Neural Network (CNN) and
Recurrent Neural Network (RNN) and it also outlines the major difference between them using
handwritten MNIST dataset. The inference drawn from the experiment is that on training the
dataset for 20 epoch cycles, we get the test accuracy of 99.34% and test loss of 0.03% for CNN
whereas the test accuracy of 98.31% and test loss of 0.06% is evaluated for RNN.Hence, we can
conclude that CNN outperforms RNN in terms of efficiency and accuracy. Also, CNN provides
lesser test loss on dataset compared to RNN. However, the experiment is done assuming certain
conditions like the number of layers of neural network, the loss function, the optimizer, etc. which
can impact the result to some (negligibly small) extent. Moreover, it is worth mentioning that there
are other applications proposed in recent times where RNN provides better performance. However,
the paper serves as an informative article on the analogy between the two most common neural
networks CNN and RNN using one of the typical datasets, i.e., MNIST handwritten dataset.

REFERENCES
1. Mouawad, M. PATTERN RECOGNITION OF HANDWRITTEN DIGITS MNIST
DATASET.
2. Agrawal, A. K. (2021). Design Of CNN Based Model for Handwritten Digit
Recognition Using Different Optimizer Techniques. Turkish Journal of Computer
and Mathematics Education (TURCOMAT), 12(12), 3812-3819.
3. Gregor, K., Danihelka, I., Graves, A., Rezende, D., & Wierstra, D. (2015, June).
Draw: A recurrent neural network for image generation. In International Conference
on Machine Learning (pp. 14621471). PMLR.
4. Palvanov, A., & Im Cho, Y. (2018). Comparisons of deep learning algorithms for
MNIST in real-time environment. International Journal of Fuzzy Logic and
Intelligent Systems, 18(2), 126- 134.
5. Chen, F., Chen, N., Mao, H., & Hu, H. (2018). Assessing four neural networks on
handwritten digit recognition dataset (MNIST). arXiv preprint arXiv:1811.08278.
6. Salemdeeb, M., & Ertürk, S. (2021). Full depth CNN classifier for handwritten and
license plate characters recognition. PeerJ Computer Science, 7, e576.
7. Pratama, S. B. SVD Implementation on MNIST Image Classification Based on CNN.
8. Ahlawat, S., Choudhary, A., Nayyar, A., Singh, S., & Yoon, B. (2020). Improved
handwritten digit recognition using convolutional neural networks (CNN). Sensors,
20(12), 3344.
9. Prabhu, V. U. (2019). Kannada-MNIST: A new handwritten digits dataset for the
Kannada language. arXiv preprint arXiv:1908.01242.
10. Pashine, S., Dixit, R., & Kushwah, R. (2021). Handwritten Digit Recognition using
Machine and Deep Learning Algorithms. arXiv preprint arXiv:2106.12614.

Work Contribution

Work Done By

Abstract Shreya Mandelia 20MCA0225

Introduction Ananya Sinha 20MCA0210 and Shreya


Mandelia 20MCA0225

Literature survey and References Kritika Saini 20MCA0022 and Swarnima


Singh 20MCA0201

Convolutional neural network Ananya Sinha 20MCA0210 and Ananya


Gupta 20MCA0061

Recurrent neural network Swarnima Singh 20MCA0201

Data Set Description Ananya Gupta 20MCA0061

Proposed Work and Whole document Kritika Saini 20MCA0022


compilation

Experimental and Observation Shreya Mandelia 20MCA0225

Comparison and Conclusion Ananya Gupta 20MCA0061

You might also like