You are on page 1of 26

TRIBHUVAN UNIVERSITY

INSTITUTE OF ENGINEERING
PURWANCHAL CAMPUS

A MINOR PROJECT REPORT ON


“CLASSIFICATION OF DYSLEXIA USING CATEGORICAL NAIVE
BAYES ALGORITHM”
In partial fulfillment for the award of the Bachelor’s Degree in Computer Engineering.

SUBMITTED BY:

Guptaraj Shrestha [PUR076BCT034]


Prajita Dhakal [PUR076BCT057]
Prasanga Dahal [PUR076BCT058]

SUBMITTED TO:

DEPARTMENT OF ELECTRONICS AND COMPUTER ENGINEERING


PURWANCHAL CAMPUS
DHARAN

Date: 7th, Jun, 2023


DECLARATION
We hereby declare that the report of the project entitled “CLASSIFICATION OF
DYSLEXIA USING CATEGORICAL NAIVE BAYES ALGORITHM” which is being
submitted to the Department of Electronics and Computer Engineering, IOE, Purwanchal
Campus, Dharan in the partial fulfillment of the requirements for the award of the Degree
of Bachelor of Engineering in Computer Engineering, is a bona fide report of the work
carried out by us. The materials contained in this report have not been submitted to any
University or Institution for the award of any degree and we are the only author of this
complete work and no sources other than the listed here have been used in this work.

Guptaraj Shrestha [PUR076BCT034]

Prajita Dhakal [PUR076BCT057]

Prasanga Dahal [PUR076BCT058]

2
CERTIFICATE OF APPROVAL
The undersigned certify that they have read and recommended to the Department of
Electronics and Computer Engineering, IOE, Purwanchal Campus, a minor project work
entitled “CLASSIFICATION OF DYSLEXIA USING CATEGORICAL NAIVE
BAYES ALGORITHM” submitted by Guptaraj Shrestha, Prajita Dhakal and Prasanga
Dahal in partial fulfillment for the award of Bachelor’s Degree in Computer Engineering.
The Project was carried out under special supervision and within the time frame
prescribed by the syllabus. We found the students to be hardworking, skilled and ready to
undertake any related work to their field of study and hence we recommend the award of
partial fulfillment of Bachelor’s degree of Computer Engineering.

Head of Department

Mr. Manoj Kumar Guragai

Department of Electronics and Computer Engineering, Purwanchal Campus

Date: 7th, Jun, 2023

3
ACKNOWLEDGEMENT
We would like to show our thanks to Mr. Manoj Kumar Guragai of the Department of
Electronics and Computer Engineering, IOE Purwanchal Campus, for the cooperation in
the drafting of this proposal. We'd also like to show our appreciation to Mr. Pukar
Karki, who helped us with their valuable recommendations. We should also appreciate
the Department and renowned academics for making this possible. Without the aid of our
friends and family members, our project would not have been attainable. This feat would
not have been achieved without their assistance.

Guptaraj Shrestha [PUR076BCT034]

Prajita Dhakal [PUR076BCT057]

Prasanga Dahal [PUR076BCT058]

4
ABSTRACT
Dyslexia is a neurological condition that interferes with a person's capacity for reading, writing,
and spelling. dyslexia affects up to 20% of people worldwide and is a common learning
disability. Dyslexia Connect provides a variety of resources for people with the condition. The
proposed system provides a test series for dyslexic individuals to identify their symptoms.
Individuals can cope with this condition if they are given the right environment and support.
Otherwise, they may be a victim of social injustice, a bully, or have low self-esteem..The
proposed system will aid in predicting whether or not an individual is dyslexic in order to make
them aware of their condition.This study proposed a Categorical Naive Bayes Classifier for
predicting whether or not they are dyslexic using a test. Our dataset consists of 3 major tests
featuring reading, listening and writing test,our ML model will predict the probability based on
user input.Before beginning the project, it took us a lot of research and literature review to really
understand how to implement the model in an effective way..Ultimately, we came up with an
accuracy of around 97 percent through our Categorical NB classifier trained using more than 500
labeled datasets collected from different resources considering practical features.

Keywords: Analysis of Dyslexia, NB Classifier, Categorial NB Classifier, Python, Django

5
TABLE OF CONTENTS
DECLARATION
CERTIFICATE OF APPROVAL
ACKNOWLEDGEMENT
ABSTRACT
1. INTRODUCTION
1.1 Brief Introduction
1.2 Background
1.3 Problem Statement
1.4 Objective
1.5 Applications
1.6 Scope
1.7 Requirements
1.7.1 Hardware Requirements
1.7.2 Software Requirements
2. LITERATURE REVIEW
3. RELATED THEORY
3.1 NAIVE BAYES CLASSIFIER
3.2 CATEGORIAL NAIVE BAYES CLASSIFIER
4. METHODOLOGY
4.1 General Description
4.2 Block Diagrams
4.2.1 Block diagram for Model
4.2.2 System Block Diagram
4.3 Sequence Diagram
4.5 Use Case Diagram
5. FEASIBILITY STUDY
5.1 Technical Feasibility
5.2 Economic Feasibility
5.3 Schedule Feasibility
6. TOOLS AND TECHNOLOGIES
6.1 Tools
7. RESULT AND DISCUSSION
7.1 Home Page
7.2 Test
7.2.1 Spell Each Letter:

6
7.2.2 Choose Pronunciation
7.2.3 Listening and Typing Test
7.3 Result
8. APPLICATIONS AND LIMITATIONS
9. FURTHER ENHANCEMENT
10. CONCLUSION
11. PROJECT SCHEDULE
REFERENCES

7
1. INTRODUCTION
1.1 Brief Introduction
Dyslexia, also referred to as Reading Writing Disorder, is a neurological condition that interferes
with a person's capacity for reading, writing, and spelling. Dyslexia is a common learning
disability that affects up to (15-20) % of people worldwide. Dyslexics have trouble processing
language, which can make it challenging for them to read, write, and comprehend written
material. Dyslexia is often diagnosed in childhood, but it can affect people of all ages.

On our website "Dyslexia Connect", we provide a variety of resources for people who have
dyslexia, including details on the condition's early symptoms, evaluations, and diagnosis. The
website provides suitable text-to-speech test series for anyone to use in order to evaluate their
symptoms based on your age group.

1.2 Background
Having a reading writing disorder can be challenging as they may experience social and
emotional challenges. Dyslexia can cause frustration and anxiety, especially when trying to
complete written assignments or communicate effectively with others. This can lead to feelings
of isolation, low self-esteem, and a lack of confidence in one's abilities.

Dyslexia is not a sign of low intelligence, as people with dyslexia often have average or above-
average intelligence. It is a lifelong condition, but with appropriate interventions and support,
people with dyslexia can learn to read, write, and spell effectively. Effective treatments for
dyslexia include multisensory teaching techniques, and assistive technologies such as text-to-
speech software and audiobooks.

This system will provide a test for dyslexia and knowledge about it in simple and understandable
way.By which an individual can be aware and get their effective treatments.

1.3 Problem Statement


Dyslexia can impact many areas of an individual life. A Dyslexic have trouble reading and
writing due to which they also struggle with social anxiety, misunderstanding, professional
challenges, poor communication skills, and low self-esteem. Dyslexics struggle in their daily
lives and in their studies because their symptoms go unrecognized and misunderstood.

1.4 Objective
The primary objective of our website is to provide accurate information about dyslexia, its
symptoms,and test series for its identification. This information would be written in clear,
accessible language that is easy for dyslexic individuals to understand.

1
1.5 Applications
● Dyslexic individuals and Families: They could use our website to access information
about dyslexia, and to identify their condition.
● Education: Our website could be used as a resource for educators who work with dyslexic
students. Teachers could access information about dyslexia, instructional strategies, and assistive
technology that could help them support their students.

1.6 Scope
● Information on dyslexia
● Support for individuals with dyslexia.
● Resources for parents and educators.
● Advocacy and awareness.

1.7 Requirements
1.7.1 Hardware Requirements

● Processor:i5 10th Gen


● Hard Disk Drive: 50Gb or above
● Monitor: LCD
● RAM: 8GB or above

1.7.2 Software Requirements

● Language: Python
● Operating System: Windows 11
● Documentation: Google Docs/Slide
● Interface of Programming: Visual Studio Code/ Jupyter Notebook

2. LITERATURE REVIEW
Learning Ally (learningally.org) is an online library that provides audiobooks specifically
designed for students with dyslexia, blindness, and other learning disabilities. The platform
offers a wide range of books, including textbooks, literature, and popular fiction, which are
professionally narrated to support individuals who struggle with reading print material. Learning
Ally aims to help students overcome reading barriers and access educational content in a format
that suits their learning needs.

Lexercise is an online platform that offers free dyslexia screening tools for both children and
adults. These screening tests are designed to identify potential signs of dyslexia by evaluating

2
reading, writing, spelling, and phonics skills. The tests typically take around 5-10 minutes to
complete and can provide valuable insights into an individual's reading difficulties. However, it's
important to note that these screening tools are not diagnostic tools, and a formal assessment by a
qualified professional is necessary for an official dyslexia diagnosis.

Understood (understood.org) is a comprehensive website that provides a wealth of resources,


tools, and community support for children and adults with learning and attention issues,
including dyslexia. The website offers a wide range of information, tips, and strategies to support
individuals with dyslexia in various aspects of their lives, such as education, work, and daily
activities. Understood also provides access to expert advice, personal stories, assistive
technology recommendations, and a supportive community where individuals can connect with
others facing similar challenges.

Overall, Learning Ally, Lexercise, and Understood are valuable resources that can provide
support, information, and tools to individuals with dyslexia and other learning disabilities. They
aim to empower individuals by helping them understand their learning differences, access
appropriate accommodations, and develop strategies to thrive academically and personally. It's
always recommended to consult with professionals and educators to receive a comprehensive
assessment and personalized guidance for addressing specific learning needs.

3. RELATED THEORY
3.1 NAIVE BAYES CLASSIFIER
The Naive Bayes classifier is a popular machine learning algorithm that is based on the
principles of Bayesian probability. It is a simple and efficient classification algorithm that is
particularly well-suited for text classification and spam filtering tasks.

The Naive Bayes classifier makes use of Bayes' theorem, which is a fundamental concept in
probability theory. Bayes' theorem provides a way to calculate the probability of a hypothesis
(class label) given some observed evidence (feature variables). In the context of classification, it
helps determine the probability of a particular class label given the observed features.

The "Naive" in Naive Bayes comes from the assumption of independence made by the algorithm.
It assumes that the feature variables are conditionally independent of each other given the class
label. This means that the presence or absence of one particular feature does not affect the
presence or absence of any other feature.

The algorithm works as follows:

3
Training: The Naive Bayes classifier requires a labeled training dataset, where each instance is
associated with a class label. During the training phase, the algorithm calculates the prior
probabilities of each class label and the likelihoods of each feature variable given each class
label. The prior probability represents the probability of encountering each class label in the
dataset, while the likelihood represents the probability of observing each feature variable for
each class label.

Probability Estimation: Once the training is complete, the algorithm uses the training data to
estimate the prior probabilities and likelihoods. It calculates the probability of each class label
based on the frequency of occurrence in the training dataset. It also calculates the probability of
observing each feature variable for each class label by counting the occurrences of each feature
variable for each class label.

Classification: When a new instance with unknown class label is presented to the Naive Bayes
classifier, it calculates the posterior probability of each class label given the observed feature
variables using Bayes' theorem. The posterior probability is the probability of each class label
given the observed evidence. The class label with the highest posterior probability is selected as
the predicted class label for the new instance.

The Naive Bayes classifier is computationally efficient and can handle large datasets with high-
dimensional feature spaces. It is based on strong assumptions about the independence of the
feature variables, which may not always hold in practice. Despite this limitation, Naive Bayes
classifiers often perform well in many real-world applications and are widely used for text
classification, email spam filtering, sentiment analysis, and more.

Each algorithm of NB expects different types of data.

● GaussianNB When you have continuous features.


● CategoricalNB When you have categorical data.
● MultinomialNB Applied to text data.

4
3.2 CATEGORIAL NAIVE BAYES CLASSIFIER
Categorical Naive Bayes is a probabilistic algorithm that assumes that each feature is
independent of the others given the class label. This assumption allows us to simplify the
probability calculations by considering each feature's impact on the class label independently.
Categorical Naive Bayes is a popular machine learning algorithm used for classification tasks,
especially when dealing with categorical or discrete features. It is based on Bayes' theorem and
assumes that the features are conditionally independent given the class label.

The conditional independence assumption in Categorical Naive Bayes simplifies the model and
reduces computational complexity. However, it is worth noting that this assumption may not
hold in all scenarios. In real-world datasets, features can be dependent on each other to some
extent. If there is strong dependency among the features, the assumption of conditional
independence may result in a less accurate model. In such cases, more advanced algorithms, like
Bayesian networks or tree-based classifiers, may be more appropriate.

Categorical Naive Bayes is a simple yet powerful algorithm for text classification, spam filtering,
document categorization, and other tasks involving categorical features. It is computationally
efficient and can handle large datasets with high-dimensional feature spaces. However, it
assumes independence among features, which may not hold true in some real-world scenarios.

4. METHODOLOGY
4.1 General Description
The proposed system with Naive Bayes Classifier(NBC) for dyslexia analysis includes
following blocks:

1. Dataset: This refers to the original dataset that contains the input features (Q01 to Q16)
and the target variable (RESULT). The dataset is typically stored in a file, such as a CSV file,
and is loaded into memory for further processing.

2. Data Preprocessing: This step involves preparing the dataset for training and testing. It
may include various data preprocessing techniques, such as handling missing values, encoding
categorical features into numerical representations, or scaling numerical features to a common
range. Data preprocessing ensures that the data is in a suitable format for the machine learning
model.

5
3. Preprocessed Data: This is the dataset after applying the data preprocessing techniques.
It contains the transformed features and the target variable in a format ready for further analysis
and model training.

4. Train-Test Split: The preprocessed data is split into two sets: the training set and the
testing set. The training set is used to train the machine learning model, while the testing set is
used to evaluate the model's performance on unseen data. The split is typically done randomly,
allocating a certain percentage of the data to each set (e.g., 80% for training, 20% for testing).

5. Training Set: This subset of the preprocessed data is used to train the CategoricalNB
model. It includes the input features (Q01 to Q16) and the corresponding target variable values
(RESULT) used to teach the model the underlying patterns and relationships.

6. CategoricalNB Training: This step involves training the CategoricalNB model using the
training set. The model learns the probabilities and distributions of the categorical features for
each class (RESULT = "YES" and RESULT= "NO") based on the training data.

7. Trained CategoricalNB Model: After the CategoricalNB model is trained, it becomes a


trained model that can be used for prediction. It encapsulates the learned probabilities and
distributions, enabling it to make predictions on unseen data.

8. Testing Set: This subset of the preprocessed data is used to assess the performance of the
trained CategoricalNB model. It contains the input features (Q01 to Q16) but does not include
the target variable values. The model predicts the target variable values for this set, which are
then compared to the actual values for evaluation.

9. CategoricalNB Prediction: Using the trained CategoricalNB model, predictions are


made for the testing set. The model applies the learned probabilities and distributions to calculate
the posterior probabilities of each class given the observed feature values.

10. Predicted Target Variable: The predicted values of the target variable (DYSLEXIC)
based on the CategoricalNB model's predictions for the testing set. These values indicate the
model's estimation of whether an instance is classified as "YES" or "NO" for dyslexia based on
the input features.

11. Accuracy Evaluation: The predicted target variable values are compared with the actual
target variable values from the testing set. The accuracy score is calculated by measuring the
proportion of correct predictions out of the total predictions made.

12. Accuracy: The accuracy score is an evaluation metric that quantifies the performance of
the CategoricalNB model. It represents the percentage of correctly predicted instances out of the

6
total instances in the testing set. Higher accuracy values indicate better performance of the model
in correctly classifying instances.

4.2 Block Diagrams


4.2.1 Block diagram for Model

Fig 4.1: Block Diagram for Categorical NB Algorithm

4.2.2 System Block Diagram

Fig 4.2: System Block Diagram

7
4.3 Sequence Diagram

Fig 4.3: Sequence Diagram for the system

4.4 Activity Diagram

8
Fig 4.4: Activity Diagram for the system

9
4.5 Use Case Diagram

Fig 4.5: Use case diagram for system

10
5. FEASIBILITY STUDY
5.1 Technical Feasibility
We used Python as our programming language which is easy to handle. VS Code is a
lightweight,cross platform editor that offers a wide range of features and extensions that can be
useful for web development. So, VS Code is a feasible option for our project . So, it is
technically feasible.

5.2 Economic Feasibility


Although few human resources were needed for this project, the hardware and software needed
to create this software were not expensive. So it is economically feasible.

5.3 Schedule Feasibility


To develop the project, a proper timeline had been projected to complete a relevant portion of the
project in the scheduled time period. Most of the necessary resources were searched on the web
and were available to begin research on time. Also, all the related software packages were easily
available which makes it more feasible.

6. TOOLS AND TECHNOLOGIES


For developing this project, we started with a small module and kept adding new features
on each increment. So, we have followed the Incremental Model of the Software
Development Life Cycle (SDLC). In each increment, we tested our model very well and

11
then added some features to move to the next increment. Finally, at last, we develop our
fully tested system as shown in the following block diagram.

Fig 6.1: Incremental Model

6.1 Tools
● Python: Programming language
● Visual Studio Code: Code Editor
● Scikit-Learn: Python library for building model
● React Js: Frontend
● Django: Python Framework for Backend
● React-speech-recognition: library of React used for speech recognition

7. RESULT AND DISCUSSION

7.1 Home Page

12
7.2 Test

7.2.1 Spell Each Letter:


Here, the word is displayed, and the user must correctly spell each letter while pausing at
appropriate intervals. If the user spelled each letter correctly, the answer of each Question store
as ‘1’ but spelled an incorrect letter then set the answer of that question to ‘0’.

7.2.2 Choose Pronunciation


A word is displayed in the left box, and the user must listen to the word and choose the correct
pronunciation in the right box.

7.2.3 Listening and Typing Test


The user is given the word pronunciation in the left box and must type the word according to what they
hear in the field given in the right box.

13
7.3 Result
After answering all of the test questions, the result indicates that the person has a 0.08% chance
of being dyslexic.

Here, after answering all of the test questions, the result indicates that the person is 92.45%
probable of being a dyslexic.

14
8. APPLICATIONS AND LIMITATIONS
8.1 Applications and Scope
● Self-Assessment: Dyslexia checking websites can provide a platform for individuals to
self-assess and gather preliminary information about their reading and language skills.
These websites may offer interactive quizzes, questionnaires, or screening tools that users
can complete to evaluate their own difficulties and determine if further evaluation or
support is needed.
● Awareness and Education: Dyslexia checking websites can serve as educational
resources, providing information about dyslexia, its signs and symptoms, and how it
impacts learning. They can help raise awareness about dyslexia among individuals,
parents, educators, and the general public, promoting understanding and reducing stigma.
● Early Intervention: Dyslexia checking websites can be valuable tools for early
intervention. They can help identify potential indicators of dyslexia in children at an early
stage, allowing parents and educators to seek further assessment and support. Early
intervention is crucial for implementing appropriate strategies and interventions to
support children with dyslexia.

8.2 Limitations

● When discussing the limitation of "lack of data availability" in the context of dyslexia, it
refers to the scarcity of comprehensive and diverse datasets specifically tailored to

15
individuals with dyslexia. This limitation can impact various aspects of dyslexia research,
assessment, and the development of effective interventions or technologies.

● Voice recognition technology can indeed face challenges when used in the context of
dyslexia. Dyslexia is a learning difference that primarily affects reading, writing, and
spelling abilities, but it does not necessarily impact speech or language production.
However, dyslexic individuals may still face difficulties with speech recognition systems
due to factors such as pronunciation variations, speech impediments, or challenges in
enunciating certain words or sounds.

9. FURTHER ENHANCEMENT
The user experience can be improved by improving the user interface, we can add more test
questions, and we can increase the number of categories. We can also add a database to store the
input data along with its result data and add it up to the dataset. The test can be expanded to
examine other learning disabilities, such as dysgraphia. To help a dyslexic person even more, we
can add resources like books, audiobooks, videos, and articles. Also, with the feedback from the
users, we will try to make appropriate changes to our model in the future. Furthermore, the
dataset can be created in native language (i.e, Nepali) for more accurate classification in our
region.

16
10. CONCLUSION
Dyslexia is frequently misunderstood as a mental illness, when in fact it is a learning disorder
with symptoms visible in children since their early age. Individuals can cope with this condition
if they are given the right environment and support. Otherwise, they may be a victim of social
injustice, a bully, or have low self-esteem. This system will aid in predicting whether or not an
individual is dyslexic in order to make them aware of their condition.

This study proposed a Naive Bayes Classifier for predicting whether or not they are dyslexic
using a test. Our system consists of 3 major tests featuring Spell Each Letter, Choose
Pronunciation and listening and typing tests,our ML model will predict the probability based on
user input.

Before beginning the project, it took us a lot of research and literature review to really
understand how to implement the model in an effective way. Ultimately, we came up with an
accuracy of around 97 percent through our NB classifier trained using more than 500 labeled
data collected from different resources considering practical features.

17
11. PROJECT SCHEDULE

18
REFERENCES
[1] International Dyslexia Association, "IDA - International Dyslexia Association," [Online].
Available: https://dyslexiaida.org/.

[2] National Center for Learning Disabilities, "NCLD - National Center for Learning
Disabilities," [Online]. Available: https://www.ncld.org/.

[3] Yale Center for Dyslexia and Creativity, "Yale Center for Dyslexia and Creativity," [Online].
Available: https://dyslexia.yale.edu/.

[4] Reading Rockets, "Reading Rockets - Dyslexia," [Online]. Available:


https://www.readingrockets.org/reading-topics/dyslexia.

[5] Understood.org, "Understood - For Learning and Attention Issues," [Online]. Available:
https://www.understood.org/.

19

You might also like