Offline Kannada Handwritten Script Recognition Using Convolution Neural Networks

VISVESVARAYA TECHNOLOGICAL UNIVERSITY
BELGAUM-590014
SYNOPSIS ENTITLED
“Offline Kannada Handwritten Script Recognition

using Convolution Neural Networks”
Submitted for
BACHELOR OF ENGINEERING
In
INFORMATION SCIENCE AND ENGINEERING
For the Academic year 2023-2024
Submitted by:
M Spoorthi 1MV20IS028
Manasa C M 1MV20IS029
Shreyaa B S 1MV20IS056
Project carried out at

Sir M. Visvesvaraya Institute of Technology
Bangalore-562157
Under the Guidance of
Dr. Bhanuprakash G. C.
HOD and Professor, Department of ISE
Sir M Vivesvaraya Institute of Technology, Bangalore
DEPARTMENT OF INFORMATION SCIENCE & ENGINEERING

SIR M. VISVESVARAYA INSTITUTE OF TECHNOLOGY
HUNASAMARANAHALLI, BANGALORE-562157
ABSTRACT
Handwritten characters are still far from being replaced with the digital form. The
occurrence of handwritten text is abundant. With a wide scope, the problem of handwritten
letter recognition using computer vision and machine learning techniques has been a well
pondered upon topic. The field has undergone phenomenal development, since the
emergence of machine learning techniques.
This paper introduces an Offline Kannada Handwritten Text Recognition system

using Convolutional Neural Networks (CNNs). The primary objective is to extract text
from scanned images, accurately identify Kannada characters, and make them accessible
for various applications. This work on a major scale devises to bridge the gap between the
state-of-the-art technologies, of deep learning, to automate the solution to handwritten
character recognition, using convolutional neural networks.
Convolutional neural networks have been known to have performed extremely

well, on the vintage classification problem in the field of computer vision. Using the
advantages of the architecture and leveraging on the preprocessing free deep learning
techniques, we present a robust, dynamic and swift method to solve the problem of
handwritten character recognition, for Kannada language. CNNs, known for their
effectiveness in computer vision, are employed to automate the recognition of handwritten
Kannada characters.
To address the scarcity of Kannada training data, handwritten samples are

collected from various sources, and two recognition methods are proposed, both relying
solely on CNNs. The paper briefly mentions the exploration of different datasets, without
providing specific accuracy figures.
II
TABLE OF CONTENTS
Introduction IV
Literature Survey V
Objective of the Project X
Scope of the Project XI
Methodology XII
Technology XIII
Conclusion XIV
III
Introduction
Pattern recognition has witnessed a transformative revolution in recent times,

primarily driven by the advent of deep learning techniques. Unlike traditional methods that
rely on feature extraction and calculations, deep learning operates solely based on data,
enabling its application to a wide range of problem domains, including handwritten
character recognition. Among the various deep learning techniques, Convolutional Neural
Networks (CNNs) have gained prominence due to their adaptability and robustness. CNNs
excel in problems that involve spatial patterns, making them a valuable tool for
handwritten character recognition, particularly for languages like Kannada.
The complexity of Kannada script, characterized by diverse characters and intricate

spatial relationships, renders conventional Optical Character Recognition (OCR) and
traditional machine learning approaches less suitable. Unlike these methods that rely on
feature extraction, CNNs operate by identifying the most salient features, making them
well-suited for Kannada character recognition.
While numerous studies have successfully addressed character recognition in

widely used languages, such as English, Chinese, Arabic, and Japanese, language-specific
challenges, especially for Kannada, remain largely unexplored. Kannada, a prominent
South Indian language with 51 unique symbols, poses distinct challenges due to its
morphological complexity.
This paper aims to harness the power of CNNs for Kannada handwritten character
recognition. By leveraging the Chars74K dataset, we delve into the potential of CNNs in
solving this multifaceted problem. Our research focuses on handling uncertainty in
datasets, a pivotal aspect that significantly impacts the success of handwritten character
recognition systems. The subsequent sections delve into our proposed methodology,
experimental analysis, and concluding remarks.
IV
Signature of Guide Signature of HOD Signature of Coordinator
Literature Survey
1. “Kannada Handwritten Document Recognition using Convolutional Neural

Network”
CSITSS, IEEE, 2018
Authors: Asha K, Krishnappa H K
Approach: Employing a Convolutional Neural Network (CNN) with meticulous

preprocessing, this study realizes 99% accuracy in Kannada character recognition
using the Chars74K dataset.
Description: This paper outlines the development of a Kannada character

recognition system, emphasizing cultural preservation and the significance of deep
learning, showcasing a highly effective CNN model.
Pros:
1. High Accuracy: Achieving a 99% accuracy rate on the Chars74K dataset
demonstrates the effectiveness of the proposed CNN model.
2. Cultural Preservation: This research aims to safeguard Kannada handwritten
heritage, preserving cultural and historical knowledge for future generations.
3. Deep Learning: The adoption of deep learning, particularly CNNs, showcases
the relevance of modern technologies in solving language-specific challenges.
Cons:
1. Dataset Specific: The high accuracy achieved is largely tied to the Chars74K
dataset; model performance on other datasets or real-world documents may
vary.
2. Complexity: CNN model and preprocessing pipeline can be computationally
intensive, which is challenging for resource-constrained environments.
3. Limited Generalization: The paper focuses on Kannada character recognition
and doesn't explore broader applications or languages.
V
2. “Offline Kannada Handwritten Character Recognition Using Convolutional
Neural Networks”
WEICON-ECE, IEEE, 2019
Authors: Ramesh G, Ganesh N Sharma, J Manoj Balaji, Champa H N
Approach: The approach employs Convolutional Neural Networks (CNNs) to

recognize complex Kannada characters. CNNs learn spatial patterns directly from
data, eliminating manual feature extraction and preprocessing. Separating vowels
and consonants improves accuracy by 15%.
Description: Using a CNN model with 4 convolutional and 2 max-pooling layers,

training on 18,800 images for 50 epochs achieved 78.73% accuracy. Separating
vowels and consonants improved accuracy to 93.2%, streamlining Kannada
character recognition.
Pros:
1. Data-Driven Approach: CNNs rely on data to automatically learn features,
reducing the need for manual feature extraction and preprocessing.
2. Robustness: CNNs are known for their robustness in handling spatial patterns,
making them suitable for handwritten character recognition.
3. Improved Accuracy: The proposed method achieves a high accuracy of up to
93.2%, indicating its effectiveness in recognizing Kannada characters.
Cons:
1. Data Dependency: CNNs require substantial labeled data for training.
Collecting and annotating large datasets can be time-consuming and resource-
intensive.
2. Model Complexity: Deep learning models like CNNs can be complex, which
may require substantial computational resources for training and deployment.
3. Overfitting: CNNs can be prone to overfitting, especially with limited training
data, which might necessitate techniques to mitigate overfitting.
VI
3. “Kannada Handwritten Script Recognition using Machine Learning
Techniques”
DISCOVER, IEEE, 2019
Authors: Roshan Fernandes, Anisha P Rodrigues
Approach: This paper explores two methods for handwritten Kannada script
recognition: Tesseract OCR and Convolutional Neural Networks (CNN). It focuses
on collecting handwritten training data, preprocessing the images, and training
models. Both methods achieved over 80% accuracy, with CNN performing better,
especially when limited to vowels and consonants.
Description: The study addresses the unique challenges of Kannada character

recognition, which include variations in handwriting, spacing, and a lack of
training data. Tesseract requires manual labeling and precise image conditions,
while CNN offers automation and flexibility.
Pros:
CNN:
1. High Accuracy: CNN models excel but demand computational resources.
2. Adaptability: Versatile for diverse fonts and scripts.
Tesseract:
1. Ease of Use: Simple setup for multi-language support.
2. Versatility: Suitable for printed text recognition.
Cons:
CNN:
1. Complexity: Intricate model development.
2. Data Demands: Require extensive datasets.
Tesseract:
1. Handwriting Challenges: Struggles with handwritten text.
2. Formatting Restrictions: Strict image requirements.
VII
4. “Deep Learning Network Architecture based Kannada Handwritten
Character Recognition”
ICIRCA, IEEE, 2019
Authors: N Shobha Rani, Subramani A C, Akshay Kumar P, Pushpa B R
Approach: This method integrates Agile and Lean strategies, focusing on

adaptability, cross-functional teams, and iterative development, encouraging
continual improvement through feedback loops.
Description: This hybrid approach melds Agile and Lean methods, focusing on
adaptability, frequent collaboration, and rigorous quality checks. Open
communication and iterative practices drive its success.
Pros:
1. Flexibility: The method can easily adapt to evolving project requirements,
reducing the risk of scope changes.
2. Efficiency: Streamlined processes and iterative development enhance
productivity.
3. Quality: Continuous testing and feedback lead to higher-quality deliverables.
4. Engagement: Cross-functional teams foster collaboration, boosting team
morale and ownership.
Cons:
1. Complexity: Managing cross-functional teams and dynamic processes can be
challenging.
2. Resource Intensive: Requires time and resources for frequent reviews and
iterations.
3. Client Involvement: It relies heavily on constant client feedback, which may
not always be feasible.
4. Documentation: In the pursuit of agility, comprehensive documentation might
be lacking, potentially impacting future maintenance.
VIII
Objective of the Project
1. Image preprocessing: The first goal is to perform image preprocessing, ensuring

the input images are optimized for character recognition. Techniques will be
employed for noise reduction, contrast enhancement, normalization, and
binarization. By improving image quality, the subsequent character recognition
process will be more robust.
2. Feature extraction: Following preprocessing, the project intends to extract

relevant features from the characters. This involves capturing spatial, geometric,
and directional attributes. These features will be used to create distinctive feature
vectors for each character, making recognition more accurate.
3. Recognition of character: The central objective is to recognize characters

effectively, particularly handwritten Kannada characters. The system will employ
machine learning and deep learning models, such as Convolutional Neural
Networks (CNNs) and Support Vector Machines (SVMs), to identify characters
based on the extracted features.
4. Classification of character: The final objective is character classification,

assigning recognized characters to predefined categories. The classification can be
multiclass, distinguishing among various characters in the Kannada script.
Furthermore, the system can be extended to accommodate bilingual or multilingual
recognition, classifying characters from different languages.
IX
Scope of the Project
Kannada Handwritten Text Recognition stands as a transformative technology with

far-reaching applications and potential impact. This innovative approach to deciphering
handwritten Kannada script holds the key to numerous opportunities, revolutionizing not
only accessibility but also contributing significantly to cultural preservation,
administrative efficiency, healthcare advancements, and linguistic research.
1. Accessibility for All: Kannada Handwritten Text Recognition ensures content

accessibility for all users, including the visually impaired. It can convert
handwritten materials into Braille or speech, promoting inclusive education and
access to information.
2. Cultural Preservation: By digitizing historical documents and manuscripts, the
technology preserves Kannada's rich cultural and linguistic heritage, aiding
researchers, historians, and enthusiasts.
3. Efficient Administration: It streamlines administrative tasks in government
organizations, reducing manual labor, improving data accuracy, and enhancing
document processing.
4. Healthcare Advancements: The digitization of handwritten medical records
enhances healthcare efficiency, enabling better patient care and communication
among healthcare professionals.
5. Research and Linguistics: For researchers and linguists, this technology offers
valuable insights into language evolution, variations, and linguistic patterns in
Kannada, contributing to linguistic research and development.
Kannada Handwritten Text Recognition encompasses an array of applications that

extend from making education accessible for all to safeguarding cultural treasures and
increasing administrative efficiency. Furthermore, it propels advancements in healthcare,
while also offering an invaluable resource for linguistic research. Its potential to transform
various sectors underscores its significance as a dynamic and versatile technology, with a
profound impact on both society and culture.
X
Methodology
1. Data Collection: The first step is to gather a diverse dataset of handwritten

Kannada characters. This dataset should cover a wide range of writing styles,
ensuring that the CNN model can generalize well.
2. Data Preprocessing: Before training the CNN, the dataset is preprocessed. This
may include resizing the images, normalizing pixel values, and augmenting the
data to increase its diversity and size.
3. Data Splitting: The dataset is divided into training, validation, and test sets. The
training set is used to teach the model, the validation set is used to tune hyper
parameters, and the test set is used to evaluate the model's performance.
4. CNN Architecture Design: The architecture of the CNN model is crucial. In this
case, CNN layers are employed to automatically extract relevant features from the
handwritten characters. This architecture usually includes convolutional layers for
feature extraction and pooling layers for down sampling. Multiple convolutional
and fully connected layers may be stacked.
5. Model Training: The CNN model is trained on the training dataset using an
optimization algorithm, often backpropagation and stochastic gradient descent. The
model learns to recognize patterns and features in the Kannada characters.
6. Post-processing: Post-processing techniques like character segmentation and noise

removal may be applied to further enhance recognition accuracy.
XI
Technology
1. Data Collection: Gather a large dataset of handwritten Kannada characters. You

can use publicly available datasets or create your own by collecting samples of
handwritten characters.
2. Data Preprocessing:
 Image Resizing: Resize all images to a common size to ensure consistency.
 Image Enhancement: Apply techniques like contrast stretching and noise
reduction to improve image quality.
 Data Augmentation: Create variations of the dataset through techniques like
rotation, scaling, and translation to make the model more robust.
3. Convolutional Neural Networks (CNNs): CNNs are the core technology for
image recognition. You can use deep learning frameworks like TensorFlow or
PyTorch to design and train your CNN models. The architecture of the CNN can
vary, but it typically consists of convolutional layers, pooling layers, and fully
connected layers.
4. Model Training:
 Splitting: Split the dataset into training, validation, and test sets.
 Training: Train your CNN model on the training data, and validate its
performance using the validation set.
 Fine-Tuning: Fine-tune the model by adjusting hyper parameters and
architecture as needed.
5. Data Labeling: Annotate your dataset with the correct Kannada characters or
labels. This is essential for supervised learning.
6. Deployment: Once your model is trained and evaluated, you can deploy it to
various platforms, including web applications. Technologies like Flask for web
development.
7. User Interface (UI): Design a user-friendly interface for users to input
handwritten Kannada characters. Technologies like HTML, CSS, and JavaScript
can be used for web-based interfaces.
XII
Conclusion
The proposed method classifies and identifies the Kannada handwritten characters
using deep learning method. This method gives an easy way to the user since there is no
preprocessing of data. Those works are handled by the Neural network, which is brain of
the deep learning model. This reduces the burden on the user making the work more
promising. With capsule network the model trained with a good amount of data is able to
recognize the Kannada handwritten characters.
The future implementations can be to recognize the words and later recognizing
sentences. The next stage will be understanding the sentences and giving satisfying
answers. Once the network is able to understand the sentences, model can be trained to
summarize the context of the given text input or to translate the input into some other
languages.
XIII

Offline Kannada Handwritten Script Recognition Using Convolution Neural Networks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Offline Kannada Handwritten Script Recognition Using Convolution Neural Networks

Uploaded by

Copyright:

Available Formats

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

“Offline Kannada Handwritten Script Recognition

Project carried out at

DEPARTMENT OF INFORMATION SCIENCE & ENGINEERING

This paper introduces an Offline Kannada Handwritten Text Recognition system

Convolutional neural networks have been known to have performed extremely

To address the scarcity of Kannada training data, handwritten samples are

Pattern recognition has witnessed a transformative revolution in recent times,

The complexity of Kannada script, characterized by diverse characters and intricate

While numerous studies have successfully addressed character recognition in

1. “Kannada Handwritten Document Recognition using Convolutional Neural

Authors: Asha K, Krishnappa H K

Approach: Employing a Convolutional Neural Network (CNN) with meticulous

Description: This paper outlines the development of a Kannada character

Authors: Ramesh G, Ganesh N Sharma, J Manoj Balaji, Champa H N

Approach: The approach employs Convolutional Neural Networks (CNNs) to

Description: Using a CNN model with 4 convolutional and 2 max-pooling layers,

Authors: Roshan Fernandes, Anisha P Rodrigues

Description: The study addresses the unique challenges of Kannada character

Authors: N Shobha Rani, Subramani A C, Akshay Kumar P, Pushpa B R

Approach: This method integrates Agile and Lean strategies, focusing on

1. Image preprocessing: The first goal is to perform image preprocessing, ensuring

2. Feature extraction: Following preprocessing, the project intends to extract

3. Recognition of character: The central objective is to recognize characters

4. Classification of character: The final objective is character classification,

Kannada Handwritten Text Recognition stands as a transformative technology with

1. Accessibility for All: Kannada Handwritten Text Recognition ensures content

Kannada Handwritten Text Recognition encompasses an array of applications that

1. Data Collection: The first step is to gather a diverse dataset of handwritten

6. Post-processing: Post-processing techniques like character segmentation and noise

1. Data Collection: Gather a large dataset of handwritten Kannada characters. You

You might also like