Professional Documents
Culture Documents
BELGAUM-590014
SYNOPSIS ENTITLED
BACHELOR OF ENGINEERING
In
INFORMATION SCIENCE AND ENGINEERING
For the Academic year 2023-2024
Submitted by:
M Spoorthi 1MV20IS028
Manasa C M 1MV20IS029
Shreyaa B S 1MV20IS056
Dr. Bhanuprakash G. C.
HOD and Professor, Department of ISE
Sir M Vivesvaraya Institute of Technology, Bangalore
Handwritten characters are still far from being replaced with the digital form. The
occurrence of handwritten text is abundant. With a wide scope, the problem of handwritten
letter recognition using computer vision and machine learning techniques has been a well
pondered upon topic. The field has undergone phenomenal development, since the
emergence of machine learning techniques.
II
TABLE OF CONTENTS
Introduction IV
Literature Survey V
Objective of the Project X
Scope of the Project XI
Methodology XII
Technology XIII
Conclusion XIV
III
Introduction
This paper aims to harness the power of CNNs for Kannada handwritten character
recognition. By leveraging the Chars74K dataset, we delve into the potential of CNNs in
solving this multifaceted problem. Our research focuses on handling uncertainty in
datasets, a pivotal aspect that significantly impacts the success of handwritten character
recognition systems. The subsequent sections delve into our proposed methodology,
experimental analysis, and concluding remarks.
IV
Signature of Guide Signature of HOD Signature of Coordinator
Literature Survey
Pros:
1. High Accuracy: Achieving a 99% accuracy rate on the Chars74K dataset
demonstrates the effectiveness of the proposed CNN model.
2. Cultural Preservation: This research aims to safeguard Kannada handwritten
heritage, preserving cultural and historical knowledge for future generations.
3. Deep Learning: The adoption of deep learning, particularly CNNs, showcases
the relevance of modern technologies in solving language-specific challenges.
Cons:
1. Dataset Specific: The high accuracy achieved is largely tied to the Chars74K
dataset; model performance on other datasets or real-world documents may
vary.
2. Complexity: CNN model and preprocessing pipeline can be computationally
intensive, which is challenging for resource-constrained environments.
3. Limited Generalization: The paper focuses on Kannada character recognition
and doesn't explore broader applications or languages.
V
2. “Offline Kannada Handwritten Character Recognition Using Convolutional
Neural Networks”
WEICON-ECE, IEEE, 2019
Pros:
1. Data-Driven Approach: CNNs rely on data to automatically learn features,
reducing the need for manual feature extraction and preprocessing.
2. Robustness: CNNs are known for their robustness in handling spatial patterns,
making them suitable for handwritten character recognition.
3. Improved Accuracy: The proposed method achieves a high accuracy of up to
93.2%, indicating its effectiveness in recognizing Kannada characters.
Cons:
1. Data Dependency: CNNs require substantial labeled data for training.
Collecting and annotating large datasets can be time-consuming and resource-
intensive.
2. Model Complexity: Deep learning models like CNNs can be complex, which
may require substantial computational resources for training and deployment.
3. Overfitting: CNNs can be prone to overfitting, especially with limited training
data, which might necessitate techniques to mitigate overfitting.
VI
3. “Kannada Handwritten Script Recognition using Machine Learning
Techniques”
DISCOVER, IEEE, 2019
Approach: This paper explores two methods for handwritten Kannada script
recognition: Tesseract OCR and Convolutional Neural Networks (CNN). It focuses
on collecting handwritten training data, preprocessing the images, and training
models. Both methods achieved over 80% accuracy, with CNN performing better,
especially when limited to vowels and consonants.
Pros:
CNN:
1. High Accuracy: CNN models excel but demand computational resources.
2. Adaptability: Versatile for diverse fonts and scripts.
Tesseract:
1. Ease of Use: Simple setup for multi-language support.
2. Versatility: Suitable for printed text recognition.
Cons:
CNN:
1. Complexity: Intricate model development.
2. Data Demands: Require extensive datasets.
Tesseract:
1. Handwriting Challenges: Struggles with handwritten text.
2. Formatting Restrictions: Strict image requirements.
VII
4. “Deep Learning Network Architecture based Kannada Handwritten
Character Recognition”
ICIRCA, IEEE, 2019
Description: This hybrid approach melds Agile and Lean methods, focusing on
adaptability, frequent collaboration, and rigorous quality checks. Open
communication and iterative practices drive its success.
Pros:
1. Flexibility: The method can easily adapt to evolving project requirements,
reducing the risk of scope changes.
2. Efficiency: Streamlined processes and iterative development enhance
productivity.
3. Quality: Continuous testing and feedback lead to higher-quality deliverables.
4. Engagement: Cross-functional teams foster collaboration, boosting team
morale and ownership.
Cons:
1. Complexity: Managing cross-functional teams and dynamic processes can be
challenging.
2. Resource Intensive: Requires time and resources for frequent reviews and
iterations.
3. Client Involvement: It relies heavily on constant client feedback, which may
not always be feasible.
4. Documentation: In the pursuit of agility, comprehensive documentation might
be lacking, potentially impacting future maintenance.
VIII
Objective of the Project
IX
Scope of the Project
X
Methodology
2. Data Preprocessing: Before training the CNN, the dataset is preprocessed. This
may include resizing the images, normalizing pixel values, and augmenting the
data to increase its diversity and size.
3. Data Splitting: The dataset is divided into training, validation, and test sets. The
training set is used to teach the model, the validation set is used to tune hyper
parameters, and the test set is used to evaluate the model's performance.
4. CNN Architecture Design: The architecture of the CNN model is crucial. In this
case, CNN layers are employed to automatically extract relevant features from the
handwritten characters. This architecture usually includes convolutional layers for
feature extraction and pooling layers for down sampling. Multiple convolutional
and fully connected layers may be stacked.
5. Model Training: The CNN model is trained on the training dataset using an
optimization algorithm, often backpropagation and stochastic gradient descent. The
model learns to recognize patterns and features in the Kannada characters.
XI
Technology
XII
Conclusion
The proposed method classifies and identifies the Kannada handwritten characters
using deep learning method. This method gives an easy way to the user since there is no
preprocessing of data. Those works are handled by the Neural network, which is brain of
the deep learning model. This reduces the burden on the user making the work more
promising. With capsule network the model trained with a good amount of data is able to
recognize the Kannada handwritten characters.
The future implementations can be to recognize the words and later recognizing
sentences. The next stage will be understanding the sentences and giving satisfying
answers. Once the network is able to understand the sentences, model can be trained to
summarize the context of the given text input or to translate the input into some other
languages.
XIII