Professional Documents
Culture Documents
1k7rlg38zt 1676055531 dS9Gibczif
1k7rlg38zt 1676055531 dS9Gibczif
CS Department
Project ID: UQU-CS-2022F-02
2 / 2023
اتقاء
By:
Name ID
Manar Ahmad Saeed Bajafar 438018415
Faiza Mohammed Usman Baran 438018483
Lama Saleh Abdullah Alzughaybi 439010549
Maram Nasser Muslih Alsaedi 439000020
Thraa Freed Hassan Serdar 439004340
Supervisor name:
Dr. Olfat Meraj Mirza
i
ACKNOWLEDGMENT:
First of all, we would like to thank Allah Almighty and express our gratitude to him for helping
us with this project despite the difficulties. And we would like to thank everyone who helped us during
the project, whether with advice or service and express our gratitude to them. We are pleased to thank our
supervisor, Dr. Olfat Mirza, for her advice and guidance during the project. We also thank our families
for their continued support and appreciate them for always being by our side. We cannot adequately
express how grateful we are to them.
ii
ABSTRACT
In today's world, using social media has become an essential part of daily life. For example,
WhatsApp. Whether they are active social media users or not, most people have heard of WhatsApp. One
of WhatsApp's most populous groups is Middle Easterners. In recent years, an increasing number of Arab
children have used WhatsApp to communicate with others on a local and global scale. This may have
several negative consequences in their lives. This includes the consequences associated with being bullied
and harassed online, so we propose Etiqa'a | اتقاء, an application aimed to minimize risks and keep threats
against minors from becoming a reality. The application is based on WhatsApp messages, which would
then be received, analyzed, and classified using machine learning model that uses Logistic Regression
(LR) algorithm which our result showed to have an accuracy of 81.2% to classify the message as
appropriate or inappropriate based on the text of the conversation, and then the application sends a
detailed alert to their parents based on the inappropriate threats that are detected. We believe that our
project will have a significant impact and will provide more security to young WhatsApp users.
Keywords: Machine learning, Artificial intelligence, AI, Natural Language Processing, NLP, WhatsApp,
monitoring private messages, Arabic text classification, message classification.
iii
TABLE OF CONTENTS
ACKNOWLEDGMENT: ........................................................................................................................... ii
2.3 Arabic Natural Language Processing (ANLP) based on Machine Learning .................................... 17
2.4.1 Extract WhatsApp messages from the phone's local storage ..................................................... 18
iv
3.1.2 Non-functional requirements ..................................................................................................... 27
v
7.1 Development Tools ........................................................................................................................... 88
7.3.2 Create an Event to Delete Unsaved Messages Every Two Weeks .......................................... 105
vi
9.3 Future Plans .................................................................................................................................... 130
LIST OF FIGURES
Figure 1: Cyber Harassment by Age Group (%)......................................................................................... 4
Figure 2: percentage of minor’s accessible devices and their usage of WhatsApp ..................................... 6
Figure 3: Supervised Learning Mechanism ............................................................................................... 13
Figure 4: Unsupervised Learning Mechanism ........................................................................................... 15
Figure 5: Software Use Case Diagram – | اتقاءEtiqa’a system main functions (ESMF). ........................... 29
Figure 6: Software Use Case Diagram – | اتقاءEtiqa’a system on parents’ device...................................... 30
Figure 7: Software Use Case Diagram – | اتقاءEtiqa’a system on child’s device........................................ 31
Figure 8: Software context level diagram .................................................................................................. 38
Figure 9: Software DFD Diagram – Level 0 ............................................................................................. 39
Figure 10: Software DFD Diagram – level 1 - Send an alert..................................................................... 40
Figure 11: Software DFD Diagram – level 1 - Edit account settings. ....................................................... 41
Figure 12: Software Sequence Diagram -Parent device (create account, log in)....................................... 42
Figure 13: Software Sequence Diagram -Parent device (add child), child device (log in, Give permission)
.................................................................................................................................................................... 43
Figure 14: Software Sequence Diagram -Parent device (view alert, advice ) ........................................... 44
Figure 15: Software Sequence Diagram -Parent device (settings) ............................................................ 45
Figure 16: Software Sequence Diagram -Parent device (settings 2) ......................................................... 46
Figure 17: Activity Diagram -Parent device (create account) ................................................................... 47
Figure 18: Activity Diagram - (Login) ...................................................................................................... 48
Figure 19: Activity Diagram - Parent device (Add child) ......................................................................... 49
Figure 20: Activity Diagram – child device (get permission) .................................................................. 50
Figure 21: Activity Diagram – (detect inappropriate message) ................................................................. 51
Figure 22: Activity Diagram –parent device (save message) .................................................................... 52
Figure 23: Activity Diagram –parent device (get advice)......................................................................... 53
Figure 24: Activity Diagram –parent device (delete account) .................................................................. 53
vii
Figure 25: Class Diagram of | اتقاءEtiqa’a system ..................................................................................... 55
Figure 26: Project management –Trello program card .............................................................................. 60
Figure 27: Waterfall model ........................................................................................................................ 60
Figure 28: Component architecture of | اتقاءEtiqa’a system. ..................................................................... 65
Figure 29: Entity Relationship Diagram (ERD) of | اتقاءEtiqa’a system ................................................... 66
Figure 30: Mapping ERD to Relational Schema of | اتقاءEtiqa'a system ................................................... 67
Figure 31: Interface Structure Design (ISD) of | اتقاءEtiqa’a system........................................................ 69
Figure 32: 0- About the application interface. ........................................................................................... 70
Figure 33: 1- Device selection interface. ................................................................................................... 70
Figure 34: 2- Create an account or login selection interface ..................................................................... 71
Figure 35: 2.1- Create an account interface. .............................................................................................. 71
Figure 36: 2.2- Login interface .................................................................................................................. 72
Figure 37: 2.1.1- Entering verification code interface ............................................................................... 72
Figure 38: 2.1.1.1- Confirmation interface ................................................................................................ 73
Figure 39: 2.2.1- Welcoming interface. ..................................................................................................... 73
Figure 40: 2.2.1.2- Parent's homepage with child added interface ............................................................ 74
Figure 41: 2.2.1.1- Parent's homepage with no child added interface ....................................................... 74
Figure 42: 2.2.1.2.1- Alert in more detail interface ................................................................................... 74
Figure 43: 2.2.1.2.2- Advice category interface......................................................................................... 75
Figure 44: 2.2.1.2.2.1- Specific category advice interface ........................................................................ 75
Figure 45: 2.2.1.2.3- Account settings interface. ....................................................................................... 75
Figure 46: 2.2.1.2.3.1- Account info interface. .......................................................................................... 76
Figure 47: 2.2.1.2.3.2- Children list interface. ........................................................................................... 76
Figure 48: 2.2.1.2.3.3- Alert history interface ........................................................................................... 76
Figure 49: 2.2.1.2.3.4- Help center interface ............................................................................................. 76
Figure 50: 2.2.1.2.3.1.1- Edit account info interface. ................................................................................ 77
Figure 51: 2.2.1.2.3.1.1- Delete warning window. .................................................................................... 77
Figure 52: 2.2.1.2.3.2.3- Add child interface ............................................................................................. 78
Figure 53: 2.2.1.2.3.2.2- Not activated child's device interface ................................................................. 78
Figure 54: 2.2.1.2.3.2.1.1- Edit not activated child's device interface ....................................................... 78
Figure 55: 3.2- Choose a child parent's homepage with child added ......................................................... 79
Figure 56: 3.2.1- Inactive child interface ................................................................................................... 79
Figure 57: 3.2.1.1- Permission window ..................................................................................................... 79
Figure 58: 3.2.1.1.1- Activation terminated interface ................................................................................ 79
viii
Figure 59: 2.2.1.2.3.4.1- Application instructions interface ...................................................................... 80
Figure 60: 2.2.1.2.3.4.2- Instructions for adding a child interface............................................................. 80
Figure 61: 2.2.1.2.3.1.1.1- Confirm account deletion interface ................................................................. 80
Figure 62: Machine Learning Model ......................................................................................................... 90
Figure 63:Data Distrbution ........................................................................................................................ 93
Figure 64: Accuracy Formula [59] .......................................................................................................... 100
Figure 65: Confusion Matrix of SVM Model .......................................................................................... 101
Figure 66: Confusion Matrix of NB Model ............................................................................................. 101
Figure 67: Confusion Matrix of LR Model.............................................................................................. 101
Figure 68: Confusion Matrix of RF Model .............................................................................................. 101
Figure 69: Confusion Matrix of DT Model ............................................................................................. 101
Figure 70: Confusion Matrix of KNN Model .......................................................................................... 101
Figure 71: F1-Score Formula [59] ........................................................................................................... 102
Figure 72: Table structure for Parent table .............................................................................................. 103
Figure 73: Table structure for Child table ................................................................................................ 103
Figure 74: Table structure for WhatsApp message table ......................................................................... 104
Figure 75: Table structure for Advice table ............................................................................................. 104
Figure 76: delete_message_after_two_weeks event in etiqaa database ................................................... 105
Figure 77: About the application interface ............................................................................................. 116
Figure 78: Device selection interface. .................................................................................................... 116
Figure 79: Login interface (Parent's device) ............................................................................................ 116
Figure 80: Create account interface (Parent's device)............................................................................. 116
Figure 81: verification interface............................................................................................................... 117
Figure 82: Confirmation interface .......................................................................................................... 117
Figure 83: Welcoming interface .............................................................................................................. 117
Figure 84: Homepage with no child added (Parent's device).................................................................. 117
Figure 85: Homepage with child added (Parent's device)....................................................................... 118
Figure 86: Alert in more details interface ............................................................................................... 118
Figure 87: Advice category interface ....................................................................................................... 118
Figure 88: Specific category advice interface ......................................................................................... 118
Figure 89: Account settings interface ...................................................................................................... 119
Figure 90: Account info interface ........................................................................................................... 119
Figure 91: Children list interface ............................................................................................................. 119
Figure 92: Alert history interface ............................................................................................................ 119
ix
Figure 93: Help center interface .............................................................................................................. 120
Figure 94: Edit account info interface ..................................................................................................... 120
Figure 95: Delete warning window......................................................................................................... 120
Figure 96: Add child interface ................................................................................................................. 120
Figure 97: Inactive child interface (Child device) ................................................................................... 121
Figure 98: Permission window (Child device) ........................................................................................ 121
Figure 99: Activation terminated interface (Child device) ................................................................ 121
Figure 100: Application instructions interface........................................................................................ 121
Figure 101: Instructions for adding child interface.................................................................................. 122
Figure 102: Confirm account deletion interface ..................................................................................... 122
LIST OF TABLES
Table 1: Comparison between | اتقاءEtiqa’a and current systems ................................................................ 9
Table 2: Example of ANLP Operations [4] ............................................................................................... 17
Table 3: Use Case Scenario for Create account ......................................................................................... 32
Table 4: Use Case Scenario for login ......................................................................................................... 33
Table 5: Use Case Scenario for give permission ....................................................................................... 34
Table 6: Use Case Scenario for add a child ............................................................................................... 35
Table 7: Use Case Scenario for get advice................................................................................................. 36
Table 8: Use Case Scenario for receive alerts ............................................................................................ 36
Table 9: End-User Characteristics ............................................................................................................. 58
Table 10: labeling samples ......................................................................................................................... 83
Table 11: part of the Gulf comments in the dataset. .................................................................................. 83
Table 12: Top 10 most frequent ‘dangerous’ seeds and emojis in REST API dataset............................... 84
Table 13: binary labels of the dataset ......................................................................................................... 85
Table 14: Example for Dataset Before Unifying Labels ............................................................................ 91
Table 15: Example for Dataset After Unifying Labels .............................................................................. 91
Table 16: Sample of Cleaning Results ....................................................................................................... 93
Table 17: The Result of Stemmers and Lemmatizers ................................................................................ 95
Table 18: Sample of Processing Results .................................................................................................... 96
Table 19: Example of N-gram [54] ............................................................................................................ 97
x
Table 20: Accuracy Score for Training Models Using TF-IDF ................................................................. 99
Table 21: Accuracy Score for Training Models Using Count Vectorizer .................................................. 99
Table 22: Acceptance Testing .................................................................................................................. 125
xi
Chapter 1: INTRODUCTION
1.1 Purpose of the Project
1.2 Purpose of this Document
1.3 Scope of the Project
1.4 Project Description
1.5 Existing Systems and their problems
1.5.1 Bark
1.5.2 Qustodio
1.5.3 AirDroid Parental Control
1.6 Overview of this Document
1
Introduction
Nowadays, more and more young people use and misuse technology. Unfortunately, criminals are
concentrating their efforts on tracking down vulnerable people, such as minors, to contact them through
social media. They usually use deception to lure their victims and commit crimes. Although young people
use WhatsApp regularly to communicate with friends and family, there have been incidents in which
minors have been subjected to threats such as sexting, child pornography, and bullying [1] [2]. These
risks, if not identified and addressed in a timely manner, may cause the child physical and psychological
harm. In this context, the requirement for applications to have systems to alert about the presence of
dangers becomes critical. There are few applications that help parents monitor social media private
messages, but almost all of them lack Arabic language support, which is a shame because Arabic is the
official language of 22 countries and is partially spoken as a first language in 11 other countries, making
it the first language of more than 422 million people [3].
The Middle East is where the Afro-Asiatic language of Arabic originated. Modern Standard
Arabic Language (MSA), one of the numerous Arabic language formats, is used in official
communications and spoken speech in journalism and media. The Holy Quran, literary works, and poetic
poems were all written in the other type of Arabic, known as Classical Arabic (CA). Another category is
public dialects, which differ depending on where you reside [4].
The Arabic language has 28 different alphabets in the language, which are written from right to
left. The Arabic alphabet letters take on different forms depending on where they are in a word. For
example, the letter Ain ( )عif it is at the beginning of the word, it would look like this ()عـ, at the center of
the word, it looks like this ()ـعـ, and at the end, it looks like this ([ )ـع5].
The Arabic language is the fourth most used language on social media platforms [3] so it is
necessary to have an application that can detect inappropriate messages in the Arabic language. In this
regard, our project intends to create an application that can detect inappropriate messages in Arabic and
notify parents as soon as possible after evaluating and classifying the content of messages into appropriate
or inappropriate. This saves the parent from having to read the child's private messages and invading their
privacy.
Our system is designed to classify any messages that are inappropriate based on the definition
provided by the Oxford Dictionary, inappropriate is not suitable behavior or language, such as sexual
harassment, or anything that causes damage or injury to a person, such as bullying. Our project is
significant because it seeks to keep minors safe on social media. For the time being, we will concentrate
on WhatsApp. However, in the future, we plan to enhance and expand this project to detect inappropriate
2
messages in Arabic on all social media platforms, including Twitter, Instagram, and Facebook.
We believe that by detecting any inappropriate messages across social media private messages
and making it easier for parents to know if anything inappropriate is being sent to their children, our
system will make social media a safer place for young Arabs.
| اتقاءEtiqa’a is a mobile application for Android that automatically monitors children's messages
on WhatsApp using machine learning. The scope of the application serve’s Arab parents who have
children from the age of 7 to 16 years.
Social media bullying has expanded into a huge, hotly debated topic. The Arab world is
becoming more aware of cyberbullying. According to the Cyberbullying Report [6], about 60% of Gulf
Arab children openly admit the occurrence of cyberbullying among their peers. In addition, in 2018
3
Moafa et al showed in Figure 1 which shows the percentage of cyber harassment by age group in the
Kingdom of Saudi Arabia on the Web and online networking stages, where the dark color occupies the
largest percentage, which represents the cyber harassment of people from the age of 15 years until their
current age [7].
The research study by Fahmy in 2021 [8] that involved 279 children in the 12 to 19 years old age
range discovered that "WhatsApp" accounted for 61.6% of all children social networking site usage,
followed by "Facebook" (53%), then "Instagram” (52.3%). Receiving bad messages and images is one of
the most significant forms of bullying that they experience, according to 60.4% of them, while receiving
threatening and intimidating messages is experienced by 45.6% of them. And because it is illegal to
monitor those who are older than the legal age, we restricted our age range to those not over 18.
In 2018 an online survey of 6,986 parents all around the world, including responses from 1,012
parents in KSA and UAE aged 18 and up with children aged 5 to 16 was commissioned by Norton &
Symantec and conducted by research firm Edelman Intelligence. The report showed that 31% of UAE
children 34% of KSA children from and age of 5-10 have smartphones of their own, while the age group
between 11-16 accounted for 70% of children owning their own smartphones in UAE and 74% in KSA.
The survey asked the parents if smartphones have a negative impact on their children (such as their
mental health and social skills) and 52% of UAE and 53% of KSA parents said that the smartphones does
impact their children negatively. The report also showed that 75% of UAE parent and 76% parents are
worried about online threats their children are vulnerable to, in fact parents in KSA say that they’re
significantly more concerned about online bullying than bullying a school. The survey also asked the
parents how strict they are when It comes child’s use of smartphones, UAE parents said their level of
strictness was 74%, while KSA’s level of sickness was 73%, the parents were also asked if they used any
parental control apps, 54% of UAE parents and 55& of KSA parents that they have set up parental
4
controls, the report also showed that 71% of UAE parents and 70% of KSA parents want to set limits and
parental controls, but they don’t know how to go about doing this, and 82% of UAE parents and 81% of
KSA parents said they wish they had more support/advice when it comes to protecting their children
online. [9]
A survey was published online by us to better identify the target age group and see if parents can look
through their children' devices. About 280 answered. The survey included questions such if they would
use the Eitqa’a application and their opinion on whether it was ethical or not.
Based on the survey we found that 77.1% of parents have access to their children’s devices, with 34.3%
of parents saying its ethical to go through their children’s device without their knowledge as a form of
protection method, and 42.8% of parents saying that their children are aware of them monitoring their
devices.
The parents were asked if their children use WhatsApp or not to see the age group of young WhatsApp
users, and we found that 48.28% of children from the age of 7 to 11 use WhatsApp, while the age group
from 12 to 17 had 88.10% of children using WhatsApp as its shown in
Figure 2.
Due to this our age group starts from the age of 7 to provide protection to young children using
WhatsApp.
The parents were also asked if they would use Etiqa’a application and found that 89.4% of parents said
they will use it. We found that 85.19% of parents who said that they don’t go through their children’s
devices due to the privacy issues, said they would use the Etiqa'a application commenting that its perfect
since it would protect their children's privacy and give the parents a peace of mind at the same time.
56.92% of parents that said they don’t have access to their children’s devices were older than 17 years old
Due to this our scope range limit 16 and younger. We concluded that our age range is from 7 to 16,
excluding children younger than 7 due to their low 22% WhatsApp usage, and including ages from 7 to
16 due to them being on WhatsApp and parents having access to their devices, while children form the
age of 17 and older were excluded due to parents not having or not wanting to access their children’s
devices.
5
100.00%
90.00%
80.00%
70.00%
60.00%
50.00%
40.00%
30.00%
20.00%
10.00%
0.00%
younger than 7 7-11 year old 12-17 year old older than 17
Our application uses machine learning to detect inappropriate Arabic words, and its interfaces are
in Arabic to suit our community, and it can monitor private messages, not just public data and at the same
time protect the privacy of minors because parents cannot see all messages, only when needed.
6
Three parental control applications will be discussed, which are "Bark", "Qustodio", and
"AirDroid Parental Control".
1.5.1 Bark
Bark is a parental control app that uses machine learning algorithms and leans most heavily on
deep learning in neural networks, in addition of use contextual analysis and natural language processing
to monitor messages and media effectively and accurately for 32 apps and then it alerts parents on any
potential issues found, even Emojis are monitored [10], but the algorithms are only trained in English,
Spanish, and Afrikaans languages, so it does not support the Arabic language, also monitoring capabilities
is different depending on the platform and the app that it monitors, for example, Bark can only monitor
posts on the child’s Facebook timeline, as for Instagram it can monitor the child's posts with direct
messages on Android device but cannot monitor the DMs on iOS devices and its alerts are not real-time
alerts, it may take 15 minutes on average [11].
Thanks to artificial intelligence, children will be able to maintain their privacy and there is no
need for parents to waste a lot of their time reading all messages, in addition, Bark provides
recommendations in their blog to help parents deal with alerts issues they received from their children
with block access to specific apps, but there is no controlling provided over monitored accounts
(internally) [11].
The child app is installed through sideloading, which is a separate app downloaded from the
website that could potentially compromise the security of the mobile device because the software is not
thoroughly vetted and approved by the teams at Apple or Google, and no rooting is required [11].
1.5.2 Qustodio
Qustodio is a cross-platform parental control tool that provides many features for parents to help
them keep an eye on the online activities of their children, but we will concentrate on the feature that is
similar to our project despite the others, which is only text/SMS monitoring, Unfortunately, Qustodio
stopped providing social media monitoring as it used to do with Facebook in the past, it now only
monitors SMS messages, where it provide direct monitoring to them but requiring sideloading app and
rooting device [12].
7
Qustodio does not provide any kind of artificial intelligence to monitor messages but rather
reveals everything to the parents without restrictions, it gives access to the whole sent/received messages
content, the identity of the senders, and the time of arrived/sent messages, thus allows parents to violate
the privacy of their children and waste their time reading and tracking a huge amount of messages,
however, the current generation no longer uses SMS messages as they do now with social networking to
communicate with each other, in another world this feature is useless these days [12].
Screen mirroring feature is comprehensive and shortens a lot of kids' activities, but it does not
respect the children's privacy, and it is also very difficult for busy parents to follow up on their kids'
activities constantly in the same moment, where the feature is live monitoring, so it does not save or track
any activities unless the parent connects to the kid screen [13].
8
Table 1: Comparison between | اتقاءEtiqa’a and current systems
Messages Monitoring ✔ ✔ ✔ ✔
Smart Monitoring ✔ ✖ ✖ ✔
Suspicious Content
✔ ✖ ✖ ✔
Alert
Protect Children
✔ ✖ ✖ ✔
Privacy
Arabic Language ✔
✖ ✖ ✖
Support
Free ✖ ✖ ✖ ✔
Real-Time ✖ ✖ ✖ ✔
9
1.6 Overview of this Document
This document is structured as the following:
Chapter 1: This chapter provides an introduction, description, scope of the project, and existing
systems and their problems.
Chapter 2: This chapter provides background about machine learning, natural language
processing, and literature review.
Chapter 3: This chapter provides system requirements including functional and non-functional
requirements with diagrams such as (use case diagram, sequence diagram, activity diagram, class
diagram, and DFDs diagram)
Chapter 4: This chapter provides design consideration including design constraints from user and
system software and hardware environment, in addition to the used architectural strategies which
are algorithms, project management, and methodology for developing the project.
Chapter 5: This chapter provides system design, including architecture, database system, and
interface design.
Chapter 6: this chapter showcases the datasets that are used to train the model and all their
information’s.
Chapter 7: This chapter provides the development tools of the system, the machine learning model
implementation, the database implementation, the mobile application implementation, and the
model integration with the application.
Chapter 8: This chapter provides the testing stage of | اتقاءEtiqa’a application, type of testes that
were done and their results.
Chapter 9: This chapter provides conclusion of the project, the challenges we faced throughout the
making of | اتقاءEtiqa’a application, and what we plan to do with application in the future to
improve it.
10
Chapter 2: BACKGROUND
2.1 Machine Learning
2.2 Natural Language Processing
2.3 Arabic Natural Language
Processing based on Machine Learning
2.4 Methods to get WhatsApp messages
2.4.1 Extract WhatsApp messages from the phone's
local storage
2.4.2 Extract WhatsApp messages from notifications
2.4.3 Scrape messages from WhatsApp web
2.5 Literature Review
11
BACKGROUND
This chapter will provide some information about the technologies used in the application and
describes their meanings, some of their uses, and the different types of algorithms used in each of these
techniques.
The topics covered in this chapter are Machine Learning, Natural Language Processing,
Arabic Natural Language Processing based on Machine Learning, and how to Extract messages
from notifications. It will also contain Literature Review Where we discussed other existing models or
systems that have some similarities to our application.
Machine learning seeks to develop algorithms that enable computers to learn. Learning is the
process of discovering statistical regularities or other data patterns.
Machine learning algorithms are designed to mimic the human approach to learning a task. So,
machine learning algorithms are divided into two broad categories based on the desired outcome of the
algorithm: supervised learning and unsupervised learning [14].
12
Algorithms under supervision may be used to train regression or classification.
If the input variable and the output variable are related, regression algorithms are used. It is
employed in the forecasting of continuous variables, such as weather.
When the output variable is categorical—that is, there are just two classes—classification
algorithms are used. Examples of such classes include Male-Female and Yes-No.
Support Vector Machines, Naive Bayes, and Decision trees are supervised learning algorithms
that are used for Arabic text classification.
Support Vector Machines (SVMs): a supervised learning algorithm that can be used for both
classification and regression, but it is most used for classification [16]. SVMs have been used successfully
in many pattern recognition problems in areas such as bioinformatics and biometrics. SVMs have
achieved the best results in text categorization and are widely used in NLP-related problems in different
languages, such as Arabic, for methods such as readability prediction and sentiment analysis [4]. This
model represents each dataset item as a point in space, with dimensions determined by the number of
features. Each point's value is mapped to the value of the corresponding feature. The classification is then
performed using a hyperplane to differentiate the data classes. Training data is made up of labeled
documents displayed as a vector space. It categorizes different documents into a limited dimensional
space and is also used to aid in the analysis of text, data, and documents in order to compute their
similarity. Linear algebra is used in this algorithm. When compared to other methods, this algorithm has a
distinct sense of ease. It instructs the machine to compute the degree of similarity between documents.
13
However, SVM has some limitations that may deter some users from using it. The difficulty of using
synonyms in Arabic represents a huge challenging obstacle because the Arabic language has many
synonyms for each word. Another difficult challenge is the assumption that its terms are statistically
independent. In fact, most Arabic terms are closely related [16].
Naive Bayes (NB): One of the first classification algorithms used in text classification was NB. It
is the most straightforward and second most commonly used classifier. The Naive Bayes classifier is a
popular text categorization technique that assigns documents to associated categories such as spam or
legitimate [17], as well as positive, negative, or neutral.
Decision trees: Decision Tree classifiers, in addition to the Naive Bayes classifier, produce
excellent spam detection results. Because the model is simple, the decision tree classifier is a popular
machine learning technique. Marie-Sainte et al [4] showed how Abdallah et al demonstrated their hybrid
approach, in which they combined a rule-based approach with a decision tree, outperformed existing
classifiers for the Arabic Named Entity Recognition application. [4]
This algorithm is a Java-based implementation of the C4.5 algorithm. The Decision Tree
algorithm is used to divide data based on attribute values. Depending on the testing set phase, it can be
used for classifications in the form of generated rules or decision trees. The most important feature of the
decision tree algorithm is its ability to learn disjunctive expressions that are suitable for document
classification. However, the decision tree algorithm has some drawbacks, such as the interference of
irrelevant features, which may have a negative impact on its performance [18]. Aboalnaser [16] presents
14
an idea for using a decision tree to recognize the fonts of some popular Arabic words. When these fonts
are identified, they are generalized to lines, paragraphs, or neighboring non-common words because these
components of textual material almost always use the same font. The decision tree was used to recognize
Arabic fonts. The total number of features is 48. In this paper, 36 fonts were discussed. The precision was
90.8 percent. Other fonts provided 100 percent accuracy. The average time needed for recognition was
about 0.30 seconds.
Unsupervised learning does not require any prior data as input. It is the method by which the
model learns on its own using the data that you provide. The data is not labeled in this case, but the
algorithm aids the model in forming clusters of similar types of data [15].
For example, if we have dog, duck, and cat data, the model will process and train itself using the
data. Because it has no prior experience with the data, it will form clusters based on feature similarities.
Cats and dogs with similar characteristics will end up in the same cluster, as was done in Figure 4
Unsupervised learning models, which conduct the three processes of clustering, association, and
dimensionality reduction, are useful tools for working with massive amounts of data.
15
2.2 Natural Language Processing
An intelligent machine that can interact like a human being can be created using a field of study
called natural language processing, which analyzes language for its meaning. The language that
individuals naturally speak, like Arabic, is called natural language. Additionally, the NLP system accepts
a string of sentences as input and provides structured representations that fully capture their meaning. [19]
Natural Language Understanding (NLU) and Natural Language Generation (NLG) are the two
subfields of NLP. Machines can understand natural language thanks to the NLU and evaluate it by
extracting concepts, entities, emotions, keywords, etc. And the NLG is a method for creating meaningful
clauses, sentences, and paragraphs from an internal representation. [20]
• Multilingualism: 7100 languages are spoken throughout the world; some simply work
differently, and each has its own grammar. Therefore, various methods are required to develop
language models that can be used with all of these languages [21].
• Ambiguity: is the capacity to have more than one meaning or to be understood in more than
one manner. There are different kinds of ambiguity, for example:
1- Lexical ambiguity: a word can have more than one meaning. For instance, "the boy was
crying" and "the student may leave the class". The verb "cry" is unclear in the first phrase.
It can mean two things. It could be a reference to shouting or weeping. Therefore, it's
unclear from this statement whether the boy was weeping or shouting. Due to its dual
meaning, the auxiliary verb in the second sentence causes ambiguity. In one sense, the
student's exit from the class is feasible but not certain. In the second interpretation, the
teacher has given the student permission to leave.
2- Structural ambiguity: There are two possible meanings for the sentence. For instance, "The
man saw the girl with a telescope." This illustration can mean two different things. The
first "the man saw (the girl) (with a telescope)" It indicates that the man was looking
through the telescope to see the girl. The second "The man saw (the girl with a telescope)"
which indicates that it was the girl who was carrying the telescope . This demonstrates that
structural ambiguity is not caused by the words themselves, but rather by the possible
relationships between the elements that make up the sentence [22] [19].
16
2.3 Arabic Natural Language Processing (ANLP) based on Machine Learning
As it has mentioned in previous section, Natural language processing helps the machine to
understand human speech and the correct meaning from context, which increases its effectiveness and
accuracy to produce better results, However, processing the Arabic language is one of the biggest
challenges for several reasons according to CAMeL team, which are:
1- Language forms variation: it has three formats, which are Classical Standard Arabic for the
Qur'an, formal modern, and informal that used in dialects that vary as well.
2- Morphological richness: we can extract thousands of conjugations from one word root.
3- orthographic Ambiguity: sometimes it is hard to distinguish the word meaning without
diacritics, e.g., there are two possible meanings of the sentence " ضرب أحمد محمد,"It is very
challenging to determine who beat whom? Did Ahmed beat Mohammed? Or Ahmed was
beaten by Mohammed? We cannot distinguish the subject from the object in the sentence
because there are no diacritics.
Many techniques have emerged for Arabic Natural Language Processing, Table 2 shows an
example for the process, that goes through several which are:
17
Data Cleaning ذهب أحمد إلى ذهب أحمد المدرسة
المدرسة
(Ahmed went to
school)
Normalization ذهب أحمد إلى المدرسة ذهب احمد الى المدرسه
Tokenization ذهب أحمد إلى المدرسة [ ، إلى، أحمد،ذهب
]المدرسة
Stemming ، الكاتب،)المكتبة کتب
)الكتاب (Wrote)
(The library), (the
writer), (the book)
The fact that WhatsApp does not keep a record of your communication history on its servers is
another crucial aspect of the service. WhatsApp only stores messages you send on its servers for a
maximum of 30 days or until the recipient receives them. Even then, WhatsApp cannot read your message
due to end-to-end encryption. A message is only ever kept on WhatsApp's servers if the intended recipient
is unable to receive it (maybe because they are offline or don't have a WhatsApp account). In situations
like those, WhatsApp automatically deletes the message(s) you sent from its system after 30 days. you
may be wondering where your communications are kept if WhatsApp doesn't keep them on their servers
in light of this. WhatsApp messages are kept locally on the phone as encrypted backups.
The location of WhatsApp messages on an Android device will depend on how the device is
18
configured, so we'll need to look through the settings to see where the messages are kept.
By default, WhatsApp will make a local backup every day at two in the morning (on Android).
This can be found in the phone's file system under "WhatsApp/Databases." A week's worth of backups is
stored by the program, each with a date. But we won't be able to read any of the information because
these will be encrypted.
We must root the phone to get to the file, gaining root access essentially gives you more
permissions. With root access, you may launch specialized programs that need administrator-level
permissions, change, or replace system applications and settings, and carry out other tasks that are
otherwise unavailable to regular Android users. However, rooting a phone could void the product's
warranty. Additionally, it can render the device unstable or, if done incorrectly, might disable the device
entirely, therefore this technique was excluded.
End-to-end encryption was provided by WhatsApp as one of its security features in September
2012. This action is being done to stop session hijacking and packet analysis, which frequently occurred
in the past.
All data on WhatsApp is encrypted using the crypt2, crypt5, crypt7, crypt8, crypt12, and crypt14
forms. This makes it nearly impossible to read all chat messages by hacking the database files.
WhatsApp updates its chat files backup (crypt14 file) every night at 2 am. therefore, we need to
read it at every updated new message, then we will consider crypt14 saved files, therefore reading
messages will not be instantaneous.
We can read the notifications from Android applications. Such notifications for low battery or
19
messages from Facebook or WhatsApp. Even we are able to read the call notification.
To read the notifications we need to use NotificationListenerService class, we can use it to listen
when notifications are posted or removed.
Using Selenium WebDriver, which is capable of completely automating all browser actions, we
can build code to read WhatsApp messages and use browser profiles to scan QR codes once. First: Set up
a browser profile and manually log in using a number (one time). Second: If you use this browser profile
when you launch a web driver, web.whatsapp.com won't prompt you for a QR scan again.
Due to the time constrain we couldn’t make this method work, although it is the best, but we used
the notification instead to read the messages.
20
There is a research that was conducted by Moreno et al. in 2019 [29] who developed a parental
control app on the Android platform to monitor sent and received messages on WhatsApp of teens'
phones, the application analyzes and classifies messages using natural language processing to detect any
threats like drugs, sex, and pulley to alert their parents, but the researchers used an insecure method to
access private WhatsApp messages, leaving behind a security breach that could put the device at risk.
Then in 2019, a group of Arab researchers [30], developed a real-time tool that parents can use to
monitor the activities of their kids on Twitter and alert them directly when any form of bullying is
detected in Arabic tweets, The tool's detection is limited to monitor public messages or "tweets" and its
warning is also limited to only bullying messages.
There is also research conducted by Jamoussi et al. in 2022 [31], They suggest a child protection
app protects children from online threats it analyzes children's online activities, monitors constantly used
dangerous vocabulary, and informs parents of the harm. But the application has not been implemented
yet.
There is a review conducted by AlGhamdi and Khan in 2020 [32], It provides a system that finds
suspicious messages in Twitter tweets, The system tokens and preprocesses the dataset collected from
Twitter tweets, the dataset used to train the classifier using (6) machine learning algorithms and the
algorithms used are: k-nearest neighbors, artificial neural networks, linear discriminant algorithm, support
vector machine, decision tree, and long short-term memory networks. A comparison was made between
them in terms of speed of execution, accuracy, and confusion matrices of the classifier. The results
showed that SVM achieved the best results while ANN is the lowest algorithm.
Kateb and Kalita conducted research in 2015 [33], This research discusses the difficulty of
processing data in social media (Twitter) due to the short content like tweets and conversations that can
be short of social media content, unlike long documents, The shortness of the content causes inaccuracy
of the results. The research also discusses another challenge which is the huge flow of data that is
constantly added, the research gives some successful solutions to overcome the problem of short text and
ways to overcome the flow of data.
In 2009 Farghaly and Shaalan conducted research [34], This research describes some of the
challenges that exist in Arabic Natural Language Processing (ANLP). And provides some solutions that
help overcome these challenges and also shows the general features of the Arabic language and its
specific characteristics of it.
21
A review conducted by Kanan et al. in 2019 [5], This paper uses Arabic machine learning with
social media and reviews the most used ANLP tools with AML software on social media to determine the
best tool.
The last research conducted by Alsubait and Alfageh in 2021 [35], talks about discovering
bullying texts from YouTube comments where they compareed machine learning algorithms by using
three models: Complement Naïve Bayes (CNB), Multinomial Naïve Bayes (MNB), and Linear
Regression (LR). The experiment was carried out using two feature extraction methods. feature extraction
used to reduce dimensionality in machine learning the method used is Count Vectorizer and Tfidf
Vectorizer both used to convert text data into a machine-readable format. The result of the research is that
when using a count vectorizer the Logistic Regression model can outperform both Naïve Bayes models
and Multinomial Complement and when using a Tfidf vectorizer can Complement Naive Bayes and can
also outperform the other two models.
22
Chapter 3: SYSTEM ANALYSIS
3.1 System Requirements
3.1.1 Functional and data requirements
3.1.2 Non-functional requirements
3.2 System Diagrams
3.2.1 Use Case Diagram
3.2.2 DFDs
3.2.3 Sequence Diagram
3.2.4 Activity Diagram
3.2.5 Class Diagram
23
SYSTEM ANALYSIS
The system requirements and diagrams are shown in this chapter's description of the software
demonstration. System requirements are specified in section 3.1. Diagrams are used to explain the system
in section 3.2.
To gather requirements in this chapter and to know what are the most important features that we
need to include in our application to be distinctive and achieve the reaching the intended goal, a
questionnaire has been published on social media. About 162 answered. It was also used to ask the
opinion of the target group about the idea of the application and how much they need it and whether they
are using similar existing applications (see Appendix A for more details).
1 The system will ask the parent whether it is the child's device or the parent's device for the first time
the application is downloaded.
2 If the "parent's device" was selected, this means the parent is using the application on his/her device.
2.1.1 The parent can create an account with specific information (name, email,
password, gender), all the fields are required.
2.1.2 Information going to be registered in the database if the email wasn't existed, in
another word " unique ".
2.1.3 The system will send a verification code to the email and will allow the user to
request resending the code, then parent will enter the code and if it matches the
24
sent code, a message " " تم تسجيلك بنجاحwill be displayed. And if it doesn't match it
the system will display a message that says” الرمز ال يطابق الرجاء ادخال الرمز مرة
”اخرى
2.1.4 If the email was already registered in the database, the system will alert the user
to log in instead of creating a new account.
2.2.1 The parent can login after creating the account by the email and password used to
create the account.
2.2.2 The system will check the entered information with the accounts stored in the
database.
2.2.2.1.1. If there wasn't any match, an error message will be displayed “ البريد اإللكتروني غير
صالح/”كلمة المرور خاطئة, Otherwise, the parent will login successfully.
2.2.3 The password can be reset if the parent forgot it by email, then the system will
send a verification message only if the email was registered in the database, then
the user can change the password.
2.2.3.1 If the email was stored in the database, then the system will allow the
parent to change the password.
2.2.3.2 If the email did not exist in the database the parent will get an error
message “ يرجى التحقق منه وإعادة المحاولة،البريد اإللكتروني غير صالح.”
2.3.1 The parent can add a child by creating a simple account for him/her using the
name, age, and gender of the child.
2.3.2 The system will check if the account was already registered, it will display an
error message “هذا الطفل مضاف.” Otherwise, the child will be added.
2.3.3 If the parent added three children, the “add child” button will disappear.
2.4 The system will show the parent the alerts of inappropriate messages from the child's
device that contains (inappropriate message content, the date and time the message was
sent, name of the sender).
2.4.1 The system will ask the parent if he/she wants to save the alert.
25
2.4.2 If he/she agrees, the alert will be saved in the alert history, otherwise, it will not
be saved.
2.5 Alerts for all children will be displayed and listed for the parent on the homepage for two
weeks.
2.6 The parent can view the alerts from the history and delete alerts if he/she wants.
2.7 The system gives advice to the parent to help them deal with inappropriate alerts they
received from their child’s messages.
2.8 The system provides a help center for the parent to explain how the application works
and what are the frequently asked questions and their answers.
2.10.1 The parent can delete the child's account and stop the system from monitoring the
child's messages.
3 If the "child's device" field was selected, this means the parent is using the application on his/her
child's device.
3.1 The parent must have an account that will be used to login and connect the child's device
with the parent's device, login process will be like step 2.2.
3.2 The system will take some permissions to monitor the child's WhatsApp messages
correctly.
3.2.1 The application will be able to listen to the device notifications after taking the
required permission.
3.2.2 The system will filter the notifications and collect only WhatsApp messages
notifications.
26
3.3 The application will be hidden from the child's device.
3.4 The system shall be able to detect inappropriate messages in the child's device.
3.4.1 The system will classify the messages into appropriate or inappropriate.
3.4.2 The application shall send alerts to the parent's device only if it detects any
inappropriate messages.
1. Security
• The user can have an account secured with password to save his/her personal information.
2. Privacy
• The data of each user and its analysis cannot be viewed by other users.
• The system shall protect the child's privacy by not showing all the messages to the parent, but
rather the inappropriate messages only.
3. Availability
• The system shall be available and provide its services to the users 99% per week.
4. Performance
• The performance should be high so that the system can detect messages and alert the parent
in a short time.
5. Usability
• We expect the app interfaces and icons to be clear to the user so that it is easy to understand
how to use the app.
27
6. Reliability
• The system shall be reliable when classifying messages as inappropriate or not.
7. Portability
• The system shall work on the android platform.
The first use case diagram shown in Figure 5 represents the main functions of the system without
including the create account and login process, it shows 2 users interacting with the system, the
user(parent on his device) interacts with the system via 3 main functions: (1) receive alerts, (2) add a
child which has an include relation with “save child information” This relation means that the behavior of
the “save child information” use case is part of the “add a child” use case, so when the user adds a child,
the system will save child information in the database, and (3) get advice.
So, after the user (parent on his device) logins to the system, he/she is able to receive alerts, add a
child to the system, and can get advice.
The second user(parent on child’s devices) interacts with one use case which is give permissions
which has an include readership with “read messages” This relation means that read messages only
happens if the user gives permission, “read messages” has an include relation with “detect inappropriate
messages” which is done by Machine learning tool(“ML tool”), “detect inappropriate messages” has an
include relation with “save inappropriate messages” which are saved in the database this relation means
that in order for the system to save messages in the database it will have to detect inappropriate messages
first, and “save inappropriate messages” has an include relation with “receive alerts” this relation means
that when detecting inappropriate messages this will lead to sending alert to the parent on his/her device.
28
Figure 5: Software Use Case Diagram – | اتقاءEtiqa’a system main functions (ESMF).
Figure 6 shows how the parent on his device can use | اتقاءEtiqa’a system, the parent can (1)
create account which has an include relation with “save user information” and “send verification email”,
which means that “save user information” is a part of create account, so when the user creates and
account the system will save user information in the database, and “send verification email” is part of
create account which means when the user creates and account, the system will send a verification email
to email system. The user can (2) receive alerts, (3) get advice and (4) add a child, they all have an
include relation with login which means “receive alerts”, “get advice” and “add a child” are a part of
“login”, “add a child” has an include relation with “save child information” this relation means that “save
child information” is a part of “add a child”, so when the user(parent) adds a child, the child’s information
is stored in the database, “login” has an extend relation with “display error message”, this relation means
“display error message” will happen under a certain condition relating to “login”.
29
اتقاء |
30
The third use case is showcased in Figure 7 which is about parents’ behavior with | اتقاءEtiqa’a
on child’s device, the user can give permission which has an include relation with “login” this relation
means that in order for the user to give permission he/she has to login first, “login” has an extend relation
with “display error message” this relation n means that “display error message” will only happen under a
certain condition, give permissions has an include readership with “read messages” This relation means
that read messages only happens if the user gives permission, “read messages” has an include relation
with “detect inappropriate messages” which is done by Machine learning tool(“ML tool”), “detect
inappropriate messages” has an include relation with “save inappropriate messages” which are saved in
the database this relation means that in order for the system to save messages in the database it will have
to detect inappropriate messages first, and “save inappropriate messages” has an include relation with
“receive alerts” this relation means that when detecting inappropriate messages this will lead to sending
alert to the parent on his/her device.
| اتقاء
31
Table 3: Use Case Scenario for Create account
32
code doesn’t match
• The user acknowledges the error
• The user re-writes the code
• use case continues
• Constraint(s) The email must be unique in the system database
33
data stored in the database.
34
• Constraint(s) • Allow the system to access child’s
messages
35
Table 7: Use Case Scenario for get advice
36
3.2.2 DFDs
A data flow diagram (DFD) shows how data will flow through the system and how it is
processed.
The context diagram for the relationship between | اتقاءEtiqa'a system and the external entities is
shown in Figure 8, parent on his device, email system, parent on child’s device, and machine learning
tool. The parent on his device fills his/her information: name, email, password, and gender into | اتقاء
Etiqa’a system to create an account then the system will send a verification code to the email, then he/she
can log in with the email and password to add a child with his/her name, age, and gender and then logs in
from the child's device with his account that he/she created to take the permissions to access the
WhatsApp messages notifications, then the machine learning tool receives the and processes them to
produce the classifications (appropriate messages or not). After that, the system sends alerts to the parent's
device if an inappropriate message is detected. The alert contains message content, date, time, and sender
name.
He/ She can also reset his/her password when he/she forgets it, edit account information, and take
advice to help them deal with alerts issues they received from their children.
37
Figure 8: Software context level diagram
DFD level 0
The system processes are shown in Data Flow Diagram Level 0 is shown in Figure 9. process 1
“Create an account” the parent on his device fills in his/her information: name, email, password and
gender then the system will send the verification code to the e-mail, then enter the code and if it matches
the sent code, a registration confirmation message will display and stores the information in the parent
data store. process 2 “Login” the parent should enter his/her email and password to login into the system
or can process 3 “Reset password” when he forgot it by sending a verification message to parent’s email.
In process 4 “Add a child” the parent can add a child with his/her name, age, and gender and stores the
information in the child data store. In process 5 “Listen to device notification” the parent logs in from the
child's device with his account that he created to take the permissions to access the WhatsApp messages
and then send request to API with notification details. Then process 6 “call API” sends the messages to
the machine learning tool and analyzes them to produce the classification result (appropriate messages or
not) then store it in WhatsappMessage data store. After that, the parent on his device in process 7 “send
an alert” receive an alert if an inappropriate message is detected with message content, date, time, and
sender name. In process 8 “Get advice” the parent can get advice to help them deal with alerts issues they
received from their children. In process 9 “Edit account settings” he can edit his/her account information
and then update it in the parent data store.
38
Figure 9: Software DFD Diagram – Level 0
Data Flow Diagram Level 1 for process 7 “Send an alert” demonstrates the send an alert process,
which is shown in Figure 10. It contains process 7.1 “generate an alert” which takes inappropriate
messages and sends it to the parent’s device. In process 7.2 “Display alert list on homepage” the system
display list of alerts coming from the WhatsappMessage data store to the homepage of parent’s device.
The parent can save the alert, then the WhatsappMessage data store will be updated that the message has
been saved Then, if the parent selects a specific child the system will display his/her alert only process
7.3 “Display specific child alert.”
39
Figure 10: Software DFD Diagram – level 1 - Send an alert
Data Flow Diagram Level 1 for process 9 “Edit account settings” demonstrates the edit account
settings process, is shown in Figure 11. It contains process 9.1 “Update account information” the parent
gives information to update his/her account the updates will be stored in the parent data store. In process
9.2 “Delete an account” the parent request to delete his/her account, then the account information will be
deleted from the parent data store. In process 9.3 “Update child information” and process 9.4 “Delete a
child,” the same as the previous two processes occur, and the information is stored in the child data store.
In process 9.5 “Display alert history” the system will retrieve all the alerts that the parent has saved from
the WhatsappMessage data store. Finally, in process 9.6 “Display help center” the system provides a help
center for the parent to explain how the application works and what are the frequently asked questions
and their answers. In process 9.7 “Logout” the parent can logout from the application.
40
Figure 11: Software DFD Diagram – level 1 - Edit account settings.
Figure 12 shows that a parent on his/her device can log in by writing the email and password then
the application will check if the information is valid by comparing it with the information in the database
if the information is not valid an error message will appear else the user will be taken to home page,
parent can create account by writing his/her email, password, and gender, after that the application will
check if the user is valid by comparing it with users saved in the database if user not valid an error
message will appear else the user will be saved.
41
Figure 12: Software Sequence Diagram -Parent device (create account, log in)
Figure 13 shows that the parent on his/her device can add child by entering the child's name and
age after checking, if the child is not added before the child will be saved in the data base but need to
connect the child device to the application, to connect child device parent must log in from the child
device and choose the child he/she entered before from his/her device then child device will be connected
to the application.
42
Figure 13: Software Sequence Diagram -Parent device (add child), child device (log in, Give
permission)
Figure 14 shows that the parent on his/her device can view the alert of the inappropriate message content
for all children he/she has added, if he/she chooses a specific child then just the alert of that child will be
shown, the parent can see more information about the alert, and can save the alert in history.
43
Figure 14: Software Sequence Diagram -Parent device (view alert, advice )
Figure 15 shows that the parent on his/her device can see is/her account information and can edit
or delete his/her account. The parent can view the child he/she has and their information, he/she can also
edit or delete the child, and the parent can add a child as we mentioned before.
44
Figure 15: Software Sequence Diagram -Parent device (settings)
45
Figure 16 shows that the parent on his/her device can view the alert he/she saved before in the
database, he/she also can see more information about the alert and unsaved alert, he/she can see the help if
he/she needed any help using the application.
46
3.2.4 Activity Diagram
Activity diagrams are graphical representations of workflows that include activities and actions to
support choice, iteration, and concurrency.
The first activity diagram shown in Figure 17 represents the account creation activity on
the parent's device. First, the user must select the main device to create an account. Secondly, the login
page will appear for the user, then the user will choose "Register Now," and the account creation page
will appear for him to enter his information. Then, the system will check whether the information is
unique or not. If the information is not unique, the system will send an error message. Otherwise, the
system will send the verification code to the email entered by the user, and then the user will enter the
code to create the account. In case the verification code is valid, the user’s information will be saved in
the database, and confirmation page for creating an account will be displayed. If it is invalid, the user can
re-enter the code or re-send it. He/she can also go back and enter his/her information again.
47
Figure 18 shows a diagram of the login activity on parent's device and child's device. First, the
user will select the device he/she is using; in both cases a login page will appear for the user to enter their
information. Then the system will check whether the information is in the database or not. If the
information is not present, the system will send an error message to re-enter his/her information again. In
the other case, the system will allow the user to log in. If the user is logged in with his/her device, the
user's home page will appear. If it is through the child's device, the inactive children list page will appear
to the user.
Figure 19 shows the activity diagram of a parent adding a child to his/her account. First, the user
can add a child either from the home page or the settings page. If the user chooses to add from the settings
page, the user can move to the children list page and then to the addition page. If the addition is from the
home page, the user will be taken directly to the addition page. In both cases, the parent can add a child
by entering the child's information. The system will check if the child is already registered and display an
error message to the user. Otherwise, the child will register.
48
Figure 19: Activity Diagram - Parent device (Add child)
Figure 20 shows an activity diagram of how to get permission for the child's device. Firstly, the
parent should log in on the child's device using his/her account. The system will display the children's list
page. Then the parent selects the child and gives permission for the system to listen to device
notifications. Finally, the app will disappear.
49
Figure 20: Activity Diagram – child device (get permission)
Figure 21 shows an activity diagram for detecting inappropriate messages. At first, the system
will receive input messages. The system will check if the message is inappropriate or not. If the message
is appropriate, the message will be ignored. In the other case, the system will alert the parent.
50
Figure 21: Activity Diagram – (detect inappropriate message)
51
Figure 22: Activity Diagram –parent device (save message)
Figure 23 shows an activity diagram for getting advice. First, the system will display the advice
page to the user. Then the user can select a specific advice type and it will appear with more details.
52
Figure 23: Activity Diagram –parent device (get advice)
Figure 24 shows an activity diagram of a deleted account. Firstly, the system will display the
settings page to the user, and the user can access the account information page. Then the user can delete
his/her account.
53
3.2.5 Class Diagram
Class Diagram is a form of static structure diagram that displays a system's classes, attributes,
operations, and relationships between objects to illustrate the structure of the system.
Figure 25 shows the class diagram of | اتقاءEtiqa’a project, which consists of 6 classes, starting
from the Parent class, only parents will be able to create an account with their name, email, password,
and gender, then login with the email and the password used, verify the email by a verification code that
will be sent to the email using emailVerificationCode method, reset the password in case they forgot it
and edit or delete the account, they can also add their children to their account, where the application
allows parents to add up to 3 children using the child's name, age, and gender through Child class, and
each child will belong to only one parent who can edit or delete the account.
After adding children, the application will listen for all the child’s device notifications and filter
them through the Notification class to get only WhatsApp notifications, then the filtered messages will be
sent to the MessageAnalysis class, in this class, the software will process the message through Arabic
natural language processing (ANLP method), then vectorize it using vectorizer and classify with the
model into appropriate or inappropriate.
If the application found inappropriate messages, it will create an alert using the Alert class for
each inappropriate message that contains the message sender, time and the message contents then send it
to the parents to warn them, the parent can view some advice that could help them to address the issue.
There will be an alerts history for each child in the parent account that will be created
automatically after adding the child through the AlertsHistory class, which will let the parent save their
child's alerts because they will be destroyed when their date expires.
Child cannot exist without parent, also notification and alertHistory will not be existed without a
child, this is why there is a composition relation between them.
54
Figure 25: Class Diagram of | اتقاءEtiqa’a system
55
Chapter 4: DESIGN CONSIDERATIONS
4.1 Design Constraints
4.1.1 Hardware environment
4.1.2 End user characteristics
4.2 Architectural Strategies
4.2.1 Algorithms to be used
4.2.2 Project management strategies
4.2.3 Development method
56
DESIGN CONSIDERATIONS
In this chapter, we will discuss what are the design constraints (hardware and software
environment, end-user characteristics) and the architectural strategies used in the implementation process
of | اتقاءEtiqa’a application.
These restrictions are usually imposed by the development organization, the customer, or external
regulations. Hardware, software, operational procedures, interfaces, data, and any other component of the
system can be constrained.
Hardware:
The system will be compatible with Android devices that have the following attributes:
57
Table 9: End-User Characteristics
User Characteristics
Full population of Arab parents who have children from the age
Size of the user group
of 7 to 16.
Educational
At least can read and has a little experience with technology.
level/Qualifications
For now, the system is not using any assistance tools for people
Disabilities with physical/visual handicaps to use the system, thus they are
not included.
Alsubait and Alfageh [35] talked about discovering bullying texts from YouTube comments and
58
used three machine learning algorithms to train the model, The results of their research showed that the
best algorithms they used were.
• Complement Naïve Bayes (NB) classifier: It's far a version of the Multinomial Naïve Bayes
that uses statistics from the complement of every class to decide the model’s weight.
• Logistic Regression (LR): It's a linear classification model, which takes a variable vector and
evaluates the weights for every variable then predicts item class as a vector.
AlGhamdi and khan [32] created a system that finds suspicious messages in Twitter tweets and used six
supervised machine learning algorithms, The results of their research showed that the best algorithms they
used were:
• Support Vector Machine (SVM): Used for solving classification problems. Its training
algorithm builds a model that assigns examples from the text into classes. The goal is to produce
a classification model, based on the training set and predict the testing set. through the training
phase, only a subset of the training dataset is needed.
• k-Nearest Neighbors (KNN): May be used for classification by using leveraging similarities
among textual content. Having a set of training data with labels. When given one of the testing
data, it compares its vocabulary with those trained before Then, take the most similar.
59
Figure 26: Project management –Trello program card
Gantt chart
Gantt chart is a visual representation of tasks, it shows what work is scheduled and when every
task must be submitted. Gantt chart of our project management plan can be found in Appendix B .
60
with 5 phases as shown in . خطأ! لم يتم العثور على مصدر المرجعthat flow sequentially emulating a cascade w
aterfall in its diagrammatic and processes flows, where no phase begins until the prior phase is complete,
also returning to the previous phase is not allowed.
1- Requirement gathering and analysis: this phase describes how the system will work through
requirements and analyze them using different types of UML diagrams.
For this phase a questionnaire has been used to study the feasibility of the project, determine the age
group targeted by the project and provide space for end users to suggest desired features, after that a
study for previous works from research and similar applications has been done to find the drawbacks
and finally decide the project requirements, which was analyzed using DFD, Use Case, Active,
Sequence, and Class Diagram.
2- System design: this phase focus on the system design from all its aspect, which are the architecture
design, database design, and interfaces or prototype design.
The project has a relational database that has been designed using a conceptual ER-Diagram, where
interfaces were designed based on the design principles in addition to usability goals that have been
taken into consideration.
3- Implementation: design and implementation phases are interleaved, where we convert the structure
into an executable program to satisfy the desired function requirements.
The application is created either by creating a program from scratch or by reusing and configuring
already existing components. Program errors are then found and fixed during debugging.
The implantation of our project starts by building the machine learning model and the user interfaces,
followed by building and integrating the database with the application after coding the function for
both child and parent device, and finally integrating the model with the application.
4- Testing: during the testing phase, developers execute the system using test cases produced from the
specification of the real data to determine whether the code and programming meets the requirement
and intended functions, so it has 3 stages: component testing done by the developer, system tested by
testing team, and user testing which is done by end users.
For our project, the tests that were carried out are: Unit testing, Integration testing which was done
61
after each component that was integrated, such as integrating the model with the application, and the
last test done by us was system testing after the whole project was finished, we test each requirement
to make sure that the system work correctly.
After we finished the development test, we moved to the user test and tested the acceptence of our
application by the users.
5- Maintenance: After the product has been fully operational, this phase begins. Software upgrades,
corrections, and repairs are all examples of software maintenance. Application upgrades and
integration with newly deployed systems by the customer are frequent requirements.
62
Chapter 5: SYSTEM DESIGN
5.1 System Architecture
5.2 System Database
5.3 Interface Design
63
SYSTEM DESIGN
The system design is presented in this chapter. The architecture of the system is shown in Section
5.1, the Database design using the entity relationship diagram is shown in Section 5.2, And the interface
design and its description is presented in Section 5.3.
This application has six main components as shown in Figure 28 which are: Parent device
component, which has two subcomponents which are: Login component and Service component, Child
device component, which has two subcomponents which are: Login component and Notification
component, Database component, PhpAPI component, PythonAPI component and Machine learning
tool, through which inappropriate messages will be detected (It will be detailed in Chapter 7).
Firstly, the Parent device component contains Login component, in which the parent can enter
his/her email and password to be able to use the application. And Service component through which the
parent through which can get many services such as adding a child, get an alert, getting advice and
adjusting account settings. the PhpAPI component interact with Parent device component by receiving a
request from it to obtain this services or update information, and a response will be returned to it.
Secondly, the Child device component contains Login component, in which the parent can enter
his/her email and password and then give permission to the Notification component to allow the system to
monitor notifications which in turn interacts with the PythonAPI component by sending request with
notifications to process them.
Lastly, The PythonAPI component send the notification details to Machine learning tool
component then get the classification of it. And if the notification is inappropriate, it will be stored in the
Database component then send alert to the parent’s device.
64
Figure 28: Component architecture of | اتقاءEtiqa’a system.
Entity Relationship Diagrams, or ER-Diagrams, shows the relationships between the entity sets
that are maintained in a database and the attributes to explain the logical structure of databases.
The Child entity depends on Parent entity, so it is weak, it has a name attribute as a partial
primary key, in addition to the gender and age attributes.
A child may receive WhatsApp messages, then the messages will go through processing and
65
classification to labeled as either appropriate or inappropriate, only inappropriate WhatsApp messages
will be stored to show them to the parent, WhatsAppMessage entity attributes are sender, message
content, date and time the message was received, and msg_ID as a partial primary key to identify each
message.
the inappropriate messages will remain until its expirationDate denotes that they have been
expired unless the parent chooses to save and keep them.
There will be an Advice entity that gives the parent some advice which could help them deal with
or address the issue, the entity consists of category attribute where we will classify the advice into several
categories, and each category will consist of several a sections that will be distinguished using title
attribute, reference URL and ID as primary attribute.
66
1- Mapping of regular entities
The ER-Diagram of Etiqa'a consists of 2 regular entities, Parent and Advice, a relation table has
been created for each of them with their simple attributes.
2- Mapping of weak entities
The ER-Diagram of Etiqa'a consists of 2 weak entities, Child and WhatsAppMessages, a
relation table has been created for each of them with their simple attributes, and due to their weak
type each of the relations included the owner primary key to be a foreign key in their table and
composite primary key with their partial key.
The Child table included the primary key ParentID from its owner table Parent, while
WhatsAppMessage table is owned by the Child table, therefor it included the Child primary
key, which is a composite key of the ChildID and the ParentID
Figure x:…
67
• Normalization for Relational Schema
This part shows the process of minimizing data redundancy if it exists.
o First Normal Form (1NF)
68
Figure 31: Interface Structure Design (ISD) of | اتقاءEtiqa’a system.
5.3.2 Prototype
Figure 29: Interface Structure Design (ISD) of | اتقاءEtiqa’a system.
Design Principles were considered when designing the application interfaces, which are:
Visibility, Feedback, Constraints, Consistency, and Affordance.
As shown in Figure 32, when a user initially launches the application, an interface with the
application's logo and a brief description of the application appears.
69
Figure 32: 0- About the application interface.
As shown in Figure 33, the app will then display an interface with the parent's device and the child's
device as options for the user to enter the app.
70
Figure 34: 2- Create an account or login selection interface
The interfaces for creating an account and logging in are shown in Figure 35 and Figure 36.
71
Figure 36: 2.2- Login interface
After that the user enters his/her information, a verification code will be sent to the email entered
by the user, and an interface will appear to verify the code as shown in Figure 37.
72
A screen that leads to a login will appear once the user creates an account, as shown in Figure 38.
Figure 39 shows the screen that will appear after the login is complete and contains some
instructions for using the device.
73
If the user had previously registered children, the user would then be presented with a homepage
that contains the inappropriate messages, as shown in Figure 40. If no children have yet been added, the
homepage will show up as in Figure 41.
Figure 40: 2.2.1.2- Parent's Figure 41: 2.2.1.1- Parent's homepage with no
homepage with child added interface child added interface
As shown in Figure 42, a specific message will appear with additional information if the user
selects it.
74
The advice interfaces are shown in Figure 43 and Figure 44.
75
The user can switch to other interfaces from the Settings interface, as seen in Figure 46, Figure
47, Figure 48, and Figure 49.
Figure 46: 2.2.1.2.3.1- Account info Figure 47: 2.2.1.2.3.2- Children list
interface. interface.
Figure 48: 2.2.1.2.3.3- Alert history Figure 49: 2.2.1.2.3.4- Help center
interface interface
76
From the previous Figure 46, the user can go to the account editing interface as shown in Figure
50 and if he/she chooses to delete the account, a warning window will appear as shown in Figure 51.
From the previous Figure 47, the user can move to an interface that enables him/her to add a child
as shown in Figure 52. After adding the child, a user interface will appear containing the child’s
information through which he can modify and delete the child’s account, as shown in the following
Figure 53 and Figure 54.
77
Figure 52: 2.2.1.2.3.2.3- Add child Figure 53: 2.2.1.2.3.2.2- Not activated
interface child's device interface
Figure 53: 3.2- Choose a child Figure 54: 3.2.1- Inactive child
parent's homepage with child added interface
Figure 52: 3.2- Choose a child Figure 53: 3.2.2- Inactive child
parent's homepage with child added interface.
interface.
Figure 54: Permission window. .Figure 55: 3.2.1- Active child interface
79
Figure 59: 2.2.1.2.3.4.1- Application Figure 60: 2.2.1.2.3.4.2- Instructions for
instructions interface adding a child interface
80
Figure 58: 2.2.1.2.3.1.1.1- Confirm
account deletion interface.
Chapter 6: DATA GATHERING
6.1 Instagram “Cyberbullying” Dataset
81
Chapter 6: DATA GATHERING
To detect harmful/inappropriate words in Arabic, we needed a dataset to train our model. We
combined the datasets listed below and labeled them as APROP for appropriate comments and
NOT_APROP for inappropriate ones, and we used the combined dataset to train our model.
To find accounts, an internet search for Arab social influencers was conducted. Google was used
to find accounts of: Arabic (fashionistas, singers, YouTubers, and bloggers) who had been bullied.
To make sure that the accounts selected would meet the objectives of study, they used the
following quality assessment criteria: Instagram profiles, Arabic accounts, The post must have at least
200 comments.
In March 2021, they crawled Instagram and posted dates ranging from 2019 to 2021 using the
official Instagram APIs, 200,000 Arabic comments in total was collected.
Dataset Labeling
The manual labeling approach was chosen because of the different dialects, as well as some
comments that may contain bullying/hatred without any obvious offensive words. The labeling task was
done by three annotators who were from three different Arabic dialects (Jordanian, Egyptian, and one is
Iraqi). All of the annotators had a bachelor's degree and were between the ages of 23 and 27. After
categorizing the comments as (positive/negative/neutral), they classified the negative comments further
by categorizing them according to their level of negativity into two categories (toxic and bullying), as
shown in Table 10.خطأ! لم يتم العثور على مصدر المرجع. The annotators also manually labeled the dialect. If the d
ialect of the comment was not obvious, NA was written by the annotators (not available).
82
Table 10: labeling samples
There were 46,898 comments in total, with 18,193 being negative, 17,376 being positive, and
11,329 being neutral. The annotators were asked to reclassify the negative comments into two groups
(bullying or toxic). There were 12,256 bullying comments, 5937 toxic comments, 17,376 positive
comments, and 11,329 neutral comments in the final corpus. The dataset included four dialects: Egyptian,
MSA, Gulf, and Levantine. Table 11 shows sample of the gulf data.
83
6.2 Twitter “Dangerous” Dataset
Dataset was collected by Alshehri et al [38] constructed ‘dangerous’ seed list was used to search
Twitter. REST API was used in 2020 for two weeks, 2.8M tweets were collected, then user ids were
extracted from all the users who had contributed the REST API data (a total of 399K users), the timelines
of those users was crawled resulting in 705M tweets, 107.5M tweets which had one or more items from
the ‘dangerous’ seed list was acquired, the combined datasets (the REST API dataset and the dataset
based on the timelines) resulted in 110.3M tweets
Dataset Labeling
At first 1K tweets from the REST API dataset were randomly sampled. And Two of the authors
annotated each tweet and labeled them as (‘dangerous’) or not (‘safe’). The sample annotation’s Kappa
(κ) resulted in a score of 0.57, then another random sample of 4K tweets (for a total size of 5K) was
added to the annotation pool. After wide revisions of the disagreement cases by the two annotators, the κ
score for the whole dataset (5K) was found to be at 0.90.
The resulted annotated dataset had a total of 1, 375 tweets in the ‘dangerous’ class and 3, 636 in
the ‘non-dangerous’ class.
To further understand dangerous language, 5, 011 tweets from the annotated dataset were
analyzed. They identify a number of patterns in the data, cutting across both the ‘dangerous’ and ‘safe’
classes, Table 12 shows sample of the dangerous seed and emojies. Also the dialect was manually labeled
by the annotators.
Table 12: Top 10 most frequent ‘dangerous’ seeds and emojis in REST API dataset.
84
6.3 Multi Platforms Offensive Language Dataset
The dataset was collected by Absar Chowdhury et al [39]. Amazon Mechanical Turk (AMT) was
used to annotate collected dataset. The data was collected from three different online platforms: Twitter,
Facebook, and YouTube. In addition to the offensive comments, the contents were manually annotated to
analyze the distribution of hate speech (HS) and vulgar (but not hate) (V) content.
Dataset Labeling
The dataset includes binary labels: Non-Offensive or Offensive as shown in Table 13. The label is
the final label agreed upon by at least 2 (out of 3) annotators, then the labeling was done by an expert who
further classified the offensive comments, mentioning if the comment is either hate speech (HS), vulgar
(V) or just offensive (-).
Dataset Labeling
85
dialects. Tweets were assigned one of four labels: offensive, vulgar, hate speech, or clean. Because the
offensive label encompasses both vulgar and hate speech, and vulgarity and hate speech are not mutually
exclusive, a tweet can be solely offensive or offensive, vulgar, and/or hate speech.
OFFENSIVE (OFF): Offensive Tweets contain explicit or implicit insults or attacks on other
people, as well as inappropriate language, such as Direct threats or incitement.
VULGAR (VLG): Vulgar tweets are offensive tweets containing expletives, such as references to
private parts or sexual acts.
HATE SPEECH (HS): Hate speech tweets are offensive tweets that target a specific group based
on shared characteristics such as Race, ex: ̇“( يا زنجيyA znjy” – “O Negro”)
86
Chapter 7: IMPLEMENTATION
7.1 Development Tools
87
Chapter 7: IMPLEMENTATION
This chapter presents the tools and the implementation of the system. Section 7.1 presents the
development tools of the system and, Section 7.2 presents the machine learning model implementation,
section 7.3 and 7.4 presents the database implementation, section 7.5 and 7.6 presents the mobile
application implementation, and the last section 7.7 present the model integration with the application.
o Scikit-Learn
is an easy-to-use open-source data analysis library that serves as the best standard for
Machine Learning in the Python environment. It includes algorithmic decision-making
88
methods such as classification to identify and categorize data according to patterns, which
was used to build our model, as well as evaluation methods and matrix for the machine
learning model evaluation phase, as well as many vectorizer and feature extraction techniques
[44].
o NLTK
NLTK stand for Natural Language Toolkit (NLTK), which is a popular open-source Python
library for NLP that support many languages, the Arabic language is one of them. For text
tokenization, stemming, stop word removal, and so on [46].
Free software tool in PHP language, dealing with MySQL over the web, supports a vast
variety of operations on MySQL and MariaDB. Used operations (handling databases,
permissions, users, columns, relations, tables, indexes, and many others) may be done
through the user interface. At the same time, continue to have the capability to execute any
SQL statement directly. [47]
89
such as request handling, routing, etc. The Flask application is great for beginners because it's
so simple to use. It can be applied to both straightforward and complex applications.
Additionally, it is used to quickly and easily deploy machine learning models [48].
90
cleaning and reviewing its labels a part of its data was used in order to balance between appropriate and
inappropriate classes, after merging we got 64662 text to train the model.
• Instagram Dataset: Comments were categorized into 4 labels: "toxic", "bullying", "neutral" and
"positive". The toxic and bullying were changed to inappropriate (NOT_APROP), The neutral and the
positive were changed to appropriate (APROP).
• Multi-Platform Dataset: Its data was classified into two labels: "offensive" and "non-offensive", the
label of the offensive was changed to inappropriate (NOT_APROP) and the non-offensive to
appropriate (APROP).
• Twitter Bullying Dataset: its tweets were categorized into 4 labels: “offensive”, “vulgar”,
“hateSpeech” and “clean”, the three first labels changed into inappropriate (NOT_APROP) and the
clean label was changed into appropriate (APROP).
• Twitter “Dangerous” Dataset: its tweets were categorized into 2 labels: “dangerous”, and “safe”,
the dangerous label changed into inappropriate (NOT_APROP) and the safe label was modified to be
appropriate (APROP).
A sample of the dataset before and after labels unifying shown in Table 14 and Table 15
91
7.2.1.2 Data Cleaning
Arabic text cleaning is an essential step before building the machine learning model that will help
to remove the text noises and improve the model performance, which is typically informal (not standard).
It is more difficult because of factors like the presence of dialect text, frequent spelling errors, extra
characters, diacritical markings, elongations, and so on.
According to Hegazi et al. [49] the majority of the earlier algorithms clean the noises by
anticipating all potential noises, then searching for each of them within the text, and cleaning it. As a
result, the degree of noise anticipation determines how clean the text is, which can be challenging and
produce inaccurate results.
Therefore, in order to facilitate the process and increase its accuracy of cleaning we eliminated
any non-Arabic characters in addition to other stuff that were removed, such as hashtags in Twitter
datasets, Arabic diacritics or symbols, and so on, the summary of the cleaning process:
cleaning(text):
all the previous cleaning steps except 9 and 10 were defined in this function using substring regular
expression function re.sub() to match the specified string, and strip() function to trim spaces.
The ninth step which is mapping was done manually on the data only due to the difference in the way
some letters are used among the Arabs, therefore no function was produced to do this step later on the text
messages.
92
the tenth step (correcting) was also done manually because we did not find any tool that corrects
misspelling Arabic dialects, and due to the time constrain the data was classified using CAMeL tool
dialect identifier pretrained model to identify each dialect in the data, after that we chose to filter them
and continue with Saudi Arabic dialect only.
and after cleaning the data we ended up with approximately 18000 rows of data, and Table 16 shows
sample of the data before and after cleaning.
To find out the data distribution between our classes we plot a histogram, as we see in Figure 63 the
plot shows that the data is approximately balanced.
93
7.2.1.3 Data Processing Using ANLP
Data processing is the process of the data being manipulated by a computer, in this phase, the
data is converted from raw data to machine-readable form, NLP helps to get rid of ambiguity in the
language, it processes, analyzes, and understands a large amount of data.
The following steps were taken for data processing using NLP
94
Table 17: The Result of Stemmers and Lemmatizers
Camel tool has a built in database for analyzing Gulf Arabic words (calima-glf-01). When we
tried it, the quality of stemming/lemmatizing was bad compared to Camel_lem and CAMeL_stem
which use the default database(calima-msa-r13), the default database uses a model
(MLEDisambiguator.pretrained()) that is already pre trained to find the best stemmer/lemmtizer
result automatically, while the Gulf Arabic database only gives us all the results and lets us
choose which stemmer/lemmatizer result we want, since we can’t tell which is best, the model
always ends up choosing the first choice even though the first choice isn’t always the best choice,
due to this we stuck with the default database(calima-msa-r13) since it gave much better results
4. Remove diacritics
Since CAMeL tool was used to extract words roots, it produced diacriticked root, therefore an
extra step to remove diacritics was needed.
ANLP(text):
all the previous processing steps were defined in this function everything that was used was from
CAMeL_tool library, starting with normalization, the function that was used is
normalize_alef_maksura_ar to replace all ( )ىwith ()ي, normalize_alef _ar to replace any variation
of Alef ( )أإآto plain one ()ا, normalize_alef_maksura_ar to replace any ( )ةwith ()ه.
After that, the normalized text was tokenized using simple_word_tokenize function, then
disambiguate function in pretrained morphology disambiguate was used to stem and lemm the text,
lastly, we remove the diacritics using dediac_ar function.
95
Table 18 shows sample of the data after performing ANLP function.
In this step we also removed stop words but after making a copy of the processed one so we can
test them all and find which one is the best.
For stop words removing we used mixed of NLTK stop words and multi dialect Arabic stop
words list that was collected by Alaa [52]
remove_stopwords(text):
We first split the text into single tokens using simple_word_tokenize from CAMeL_tools,
then iterate over each token to check if it exists in the stop words list or not, if it was exist then
the token will be deleted from the text, after the loop is finished the tokens will be joined back
to sentence using join method
96
in the text, the most frequently occurring terms in a text are given the most weights, even if
they may be unimportant or uninformative.
o Frequency-Inverse Document Frequency (TF-IDF): One of the most sophisticated solutions
to the TF problem, which gives uncommon or rare terms in any text a higher weight than
frequent ones, by calculating the product of TF and IDF, where TF is the frequency of the
word in the current document, whereas the IDF is a measure of how rare the keyword is
across all documents.
Another popular feature extracting technique is the words N-gram, which is a continuous
series of N words or tokens from a given text, it could be a unigram of 1 word length as
illustrated in Table 19, bigram refers to length 2, trigram refers to length 3, and so on [54].
Two different researches have tested more than one feature extraction techniques, where
first research was by Abro et al. [55] used three types of features namely n-gram (bigram) with
TF-IDF, Word2vec and Doc2vec, the highest Fscore (77%) were obtained by SVM classifier
using TF-IDF with bigram features representation, while the other research that was done by
Maghfour et al. [56] tested TF with Unigram, TF with Unigram and Bigram, TF-IDF with
Unigram, and TF-IDF with Unigram and Bigram. the best performance was registered under NB
classifier through the configuration TF with Unigram (Fscore = 85.79%).
Because of the different nature of the data and the desired goal, the two (TF-IDF/Bigram,
TF/Unigram) techniques were tested to get the best results, and it was used from Scikit_Learn
library exactly feature_extraction.text module, for TF-IDF/Bigram
TfidfVectorizer(ngram_range=(1,2)) class was used, and for TF/Unigram
CountVectorizer(ngram_range=(1,1)) class was used, but before vectorizing, the data was split
into train and test using train_test_split method from Scikit_Learn library.
97
7.2.2.1 Algorithms Selection
In sections 2.1 and 2.5 we did a deep dive of researcher’s uses of algorithms and their results. In
4.2.1 we chose the algorithms that yielded the best results when it comes to Arabic language, the
following algorithms are the ones we chose: Complement Naïve Bayes classifier, Logistic
Regression, Support Vector Machine (SVM), k-Nearest Neighbors (KNN), and to make sure no good
algorithms were left out we added two more algorithms to build a model from, based on expert
recommendation, the following additional algorithms were used along with the others: Random forest
and Decision tree.
1- Naïve Bayes (NB): both Naive Bayes classifier for multinomial model and The Complement
Naive Bayes classifier were used to build a model, multinomial yielded the best results so it was
chosen for training and testing phase.
2- Logistic Regression (LR): Logistic Regression classifier was used with no added penalty, with
random_state set to None, and multi_class set to 'auto'.
3- Support Vector Machine (SVM): SVC classifier was tested using 4 different kernels: SVC with
linear kernel, SVC with linear kernel, SVC with RBF kernel and SVC with polynomial kernel,
polynomial was tested for different degrees from a range of 1 to 100, since SVC with RBF kernel
outperformed the others, it was chosen for training and testing phase.
4- K-Nearest Neighbors (KNN): K-Neighbors Classifier was tested with different Number of
neighbors (n_neighbors) ranging from 1 to 1000, the best n_neighbors value was at 46.
5- Random Forest (RF): The number of trees in the forest (n_estimatorsint) was tested in the range
of 90 to 250 with best results generating when the value was at 200.
6- Decision Tree (DT): Decision tree classifier was used with split at each node set to ‘best’, and
random_state set to None.
98
1- Dataset features extracted by lemmatizing text and keeping the stop-words.
2- Dataset features extracted by stemming text and keeping the stop-words.
3- Dataset features extracted by lemmatizing text and with stop-words being removed.
4- Dataset features extracted by stemming text and with stop-words being removed.
Table 20 and Table 21 shows the accuracy score that was obtained from model training and testing.
Table 21: Accuracy Score for Training Models Using Count Vectorizer
99
The best accuracy score was for Logistic Regression Algorithm using Count Vectorizer that trained on
lemmatized text with stop-words which is 79.2 %, and the worst was for K-nearest Neighbor with 52.7 %
accuracy.
to determine how well our model operates and what the highest accuracy outcome it can get, we had to
train the model on different training sets, which requires to break our data into many distinct segments.
To do that we used k-fold cross validation technique, the basic idea of it is that it partitions the dataset
into k bins of equal size, and then runs k times, in each run it will take one k set to be the test data, and the
rest will be the training data [57], the best result was produced by lemmatized data with stop words from
Logistic Regression algorithm that used count vectorizer with 81.2% accuracy score.
1- Accuracy: is the number of correct calls (true-positive and true-negative) that were made in
proportion to total dataset [58] the accuracy formula showed in Figure 64, where t is true, f is
false, p is positive, and n is negative. Accuracy score for each model was showed in Table 20 and
Table 21.
2- Confusion matrix: is a table that lists the effectiveness of a classification model that show the
actual and predicted classes [58].
A confusion matrix for best score for each algorithm was calculated, and shown in the Figure
65,Figure 66 , Figure 67, Figure 68, Figure 69, and Figure 70. The confusion matrix was
calculated by plotting the confusion matrix as heatmap.
plot(y_true, y_pred)
this function was defined to take unique labels of our data, predicated labels were assigned as columns
while actual labels were assigned as rows, then DataFrame function was used to show table with data
from confusion matrix, and sns.heatmap was used to visualize the Confusion Matrix.
100
Figure 66: Confusion Matrix of NB Model Figure 65: Confusion Matrix of SVM Model
Figure 68: Confusion Matrix of RF Model Figure 67: Confusion Matrix of LR Model
Figure 70: Confusion Matrix of KNN Model Figure 69: Confusion Matrix of DT Model
101
3- F1-Score: is function of true-positive rate and positive predictive value (Precision and Recall) to
give overall indication of performance of classifier [58], Figure 71 shows F1-Score formula,
where P is Precision and R is Recall.
The F1-Score was found for the best accuracy score cases for each model, for NB the F1-score
was (80%), SVM (79.2%), RF (80%), LR (81.2%), KNN (76.4%), and DT (74.3%). The F1 score
that resulted from k-fold cross validation technique is 81.5% and it was for LR algorithm. The
difference between the F1-score and the accuracy score is very simple, this is because our data is
approximately balanced.
Pickle module is one of the most popular ways to serialize objects in Python, our machine
learning model may be saved to a file and serialized using Pickle. The trained model can be accessed and
used to make predictions by deserializing the file at a later time or in a different script. It provides the
following functions [60]:
102
7.3 Database component
o Indexes
PRIMARY KEY (`parent_id`),
UNIQUE KEY `email` (`email`);
• Child table
o Table structure
o Indexes
103
FOREIGN KEY (`parent_id`) REFERENCES `parent` (`parent_id`) ON
DELETE CASCADE ON UPDATE CASCADE;
o Indexes
PRIMARY KEY (`msg_id`,`parent_id`,`child_name`),
FOREIGN KEY (`child_name`) REFERENCES `child` (`child_name`) ON
DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (`parent_id`) REFERENCES `parent` (`parent_id`) ON
DELETE CASCADE ON UPDATE CASCADE;
• Advice table
o Table structure
o Indexes
PRIMARY KEY (`advice_id`),
104
7.3.2 Create an Event to Delete Unsaved Messages Every Two Weeks
Events in MySQL are tasks that carry out in accordance with a predetermined schedule. They are
sometimes called scheduled events. Events in MySQL are named objects that have one or more SQL
statements. They are carried out at a set number of intervals after being saved in the database. For
instance, you could design an event that runs every Sunday at 1:00 AM and optimizes every table in the
database. In many situations, including database table optimization, log cleanup, data preservation, and
the creation of intricate reports during off-peak hours, MySQL Events can be quite helpful [61].
So, Figure 76 shows the event that was created, which will be executed daily. It will delete
messages that the user hasn't saved and that are two weeks or more old.
105
7.4 PHP API component
• To connect PHP file to the database, we used PDO (PHP Data Objects) it's a lightweight,
constant interface for having access to databases [62]. By creating an instance from PDO with
following information (database name, username, password) we can use SQL instructions
using PHP language to deal with the database.
• Since flutter can't deal with PHP, we used JSON. And JSON (JavaScript Object Notation) is
an interchange format for light-weight data [63]. We used it for data exchange between the
back end (PHP) and the front end (Flutter).
Example :
This is SQL instruction takes parent email and password and get the row with the
same information .
$stmt = $con->prepare("SELECT * From `parent` WHERE `password`
= $password AND `email` = $email ");
If a row with the same information is found in the database, a message in the form
of JSON is sent to flutter with the success or failure status of the operation.
$count = $stmt->rowCount();
if ($count > 0 ) {
echo json_encode(array("status"=> "success", 'data'
=>$data));
}else{
echo json_encode(array("status"=> "fail"));
}
106
Example :
In PHP file we can write SQL instruction take parent email and password and get
the row with the same information.
$stmt = $con->prepare("SELECT * From `parent` WHERE `password`
= $password AND `email` = $email ");
If a row with the same information is found in the database, then send a post
request with a message in the form of JSON to flutter with the success or failure
status of the operation.
$count = $stmt->rowCount();
if ($count > 0 ) {
echo json_encode(array("status"=> "success", 'data'
=>$data));
}else{
echo json_encode(array("status"=> "fail"));
}
• To connect flutter, we created two functions one deals with post request and the other deals
with get request, HTTP library is needed to handle the requests. To connect flutter, we created
two functions one deals with post request and the other deals with get request, HTTP library is
needed to deal with requests. And convert library is needed to change from JSON to dart.
The method postRequest(String url, Map data) that deals with post request.
This method takes the URL that contains the request and the data that we need to
send (like email and password in previous example). Then it uses http to save the
request that has the JSON message in variable.
var response = await http.post(Uri.parse(url), body: data);
After that, it decodes the JSON message and returns the success or failure status
that we can use in flutter.(check if parent exist in database in previous example).
var responsebody = jsonDecode(response.body);
return responsebody;
107
7.5 Parent device component
• Login
loginp ():
This method is called if a parent chooses to log in from the parent device, it
checks if the parent account exists in the database by searching for an account
with the same email and password. If the parent account does not exist, an error
message will appear. Otherwise, the parent will be taken to the home page and
update his token device in database. What a token is will be explained at the end
of the section.
• Create account.
Signup ():
This method is called if a parent chooses to sign up it only can be done on the
parent's device, it saves all parent account information in the database only if the
account does not exist before. If the parent account does exist in the database, an
error message will appear. Otherwise, the parent will be saved in the database
then an email will be sent containing the verify code saved in the database.
After signing up, the confirmation (): method will call.
This method checks if the verify code entered by the parent is like the verify code
in the database.
• Change password.
To change the password, the parent needs to confirm his/her email by
confirmation (): method if he/she succeeds the method forgetPassChange() will
be called.
This method will change the old password in the database to the new password.
108
• Add child.
addChild():
This method is called if parent wants to add a new child, it saves all new child
account information in the database unless the parent has a child with the same
name before. If there is a child with the same name in the database, an error
message will appear. Otherwise, the new child will be saved in the database.
historyMessages ():
This method will display all the messages marked as saved in the database.
109
• Edit the accounts.
The parent can edit his/her account information.
The method accountInfo() display parent information saved in database , and the
method editAccount() will change the old information of parent account to the
new information .
The parent also can edit the child's account information.
If the parent wants to edit child information he/she need to choose one of his/her
children. the method childrenList() will display all the children that parent has.
After choosing one of the children, parent will be taken to child account page,
then he/she can edit this child information by calling the method editChild() this
method will replace old child information with the new information .
• Advice
advice ():
This method will display all the advices with specific category from the
database.
Regarding sending an alert to the parent's device, Firebase Cloud Messaging (FCM) was
used and will be detailed in Section 7.7. As for receiving alerts, Firebase must first be
added to the application, add the necessary dependencies, then configure it. An alert will
be sent to the parent's device by obtaining his/her token when he/she logs in. Each
110
instance of the client application is identified by a unique token string. To send a message
to a particular device, the token is necessary.
As for sending an alert to the user while he is inside the application, the alert will not
appear to the user unless you add an alert message or an internal alert. This can be done
using the flutter_local_notifications package.
• onMessage.listen
FirebaseMessaging.onMessage.listen((RemoteMessage event):
This method handles messages when your application is in foreground. A
RemoteMessage that contains information about the payload including its origin,
its unique ID, the time it was sent, whether it contained a notification, and more
[73]. Information will be displayed by flutterLocalNotificationsPlugin.show((
method which is from flutter_local_notifications package is used to show the alert
on the screen.
• Notification
flutter_notification_listener plugin is used to listen for all incoming notifications for Android.
Its features are [64]:
Service: launches a service to receive notifications.
Simple: The fields of a notification are easily accessible.
Backgrounded: After rebooting, run the dart code in the background to automatically start
the service.
Interactive: The flutter notification is interactive.
111
Flutter plugins are thin Dart wrappers on top of native (Java, Kotlin, ObjC, Swift) mobile
APIs and services. For instance, the only method to access a sensor on a phone is to create a
plugin or utilize one that is currently available. The plugin's API was created in Dart. The
plugin's implementation is written in Java/Kotlin for Android support, ObjC/Swift for iOS
support, or in both languages (for cross-platform support) [65].
This sub-component contains six methods which are: onInit, start Listening, callback,
initPlatformState, process message and activate the child account.
• onInit
@override
onInit():
initPlatformState()
This method is called initially when the screen is loaded, we need to call
initPlatformState method there to initialize the plugin.
• start Listening
startListening():
This method checks permission and starts the service.
112
• process message
processMsg(NotificationEvent msg):
• First, the http package is used to make POST requests to the server.
• Then, we must define the URL to which to send the POST request.
• This method takes the notification then send POST request from our
flutter application will be sent to the “/” route in the PythonAPI
component which we will discuss in the next section (7.7) with that
notification and its details.
• NotificationEvent is a built-in class, it provides title, message, package
name and date of each notification.
initPlatformState
initPlatformState():
// register the static to handle the events
NotificationsListener.initialize(callbackHandle: _callback)
113
7.7 Python API component
Once the machine learning model is exported, (discussed in 7.2.2.4), it can be turned into an API
using Flask and sending requests to it via a flutter application.
1- Uses the cleaning and ANLP functions discussed in machine learning component section 7.2.
2- Get the notification from POST Method.
3- Loads the model.
4- Predicts the notification label.
5- If the notification is inappropriate, stores it in the database.
6- Send an alert to the parent's device.
Regarding sending an alert to the parent's device, Firebase Cloud Messaging (FCM) was used., is
a Google cloud service that is available for free, enables app developers to deliver messages and
notifications to users across a variety of platforms, including Android, iOS, and web applications. FCM is
provided by Firebase, a company that Google purchased in 2014. Through an application programming
interface (API), FCM enables software developers to push notifications for their apps to end users. FCM
can send messages to apps in three different ways: directly to a single device, to a group of devices, or to
devices that have subscribed to a topic [66].
114
• First, we create an instance of our flask application.
• Then get the cleaning and ANLP functions discussed in machine learning component section
7.2 to use it.
• After that, defining a route in Flask, we use the decorator @app.route)’/’( , where @app is the
name of the object containing our Flask application, it used to facilitate POST request from
our flutter application.
• Then, getting the response data (notification from application) and converting it from json to
key value pair.
• Then, send the content of notification as an argument to the cleaning function and then the
ANLP function.
• Then, we apply the vectorizer on it to convert it to numbers, and then we send it to the Model
to classify it.
• After that. we store the result of the classification in in a variable called result.
• If the result of the notification classification is not appropriate, store the notification details in
the database, and get the parent token from parent table.
Note: to start the server we call the run() method of the Flask object.
115
7.8 User Interfaces
Figure 77: About the application interface Figure 78: Device selection interface.
Figure 79: Login interface (Parent's device) Figure 80: Create account interface
(Parent's device)
.
116
Figure 81: verification interface Figure 82: Confirmation interface
Figure 83: Welcoming interface Figure 84: Homepage with no child added
(Parent's device)
117
Figure 85: Homepage with child added Figure 86: Alert in more details interface
(Parent's device)
.
Figure 87: Advice category interface Figure 88: Specific category advice
interface
118
Figure 89: Account settings interface Figure 90: Account info interface
Figure 91: Children list interface Figure 92: Alert history interface
119
Figure 93: Help center interface Figure 94: Edit account info interface
Figure 95: Delete warning window Figure 96: Add child interface
120
Figure 97: Inactive child interface (Child Figure 98: Permission window (Child
device) device)
Figure 99: Activation terminated interface Figure 100: Application instructions interface
(Child device)
.
121
Figure 101: Instructions for adding child Figure 102: Confirm account deletion
interface interface
122
Chapter 8: TESTING
8.1 Unit Testing
123
Chapter 8: TESTING
This chapter discusses the testing stage of | اتقاءEtiqa'a application in order to detect and resolve
any problems or difficulties, as well as to ensure that the application is ready and free of errors that could
impair its function or effectiveness.
124
8.4 Acceptance Testing
The purpose of user acceptance testing is to evaluate the system's compliance with the
requirements and verify if it has met the required criteria for delivery to end users. This test is performed
by end users, allowing them to participate in the testing process and provide feedback on the application
to help it improve.
The users were observed during this test, and notes were taken about their performance and behavior on
each move they made while using the application.
Was the
message
Model's
User Sent Sentence classified Problem Justification
Classification
correctly?
)Q4 answer(
عمه في وجهك ياحمار ياوصخ NOT_APROP Yes -
كلب جحش حيوان غبي ثور NOT_APROP Yes -
User 1 training data size was
small; hence it didn't
اتوطا في بطنك APROP No
cover all the inappropriate
words
السالم عليكم ورحمة هللا وبركاته APROP Yes -
مساء الخير لمى APROP Yes -
وينك يا زق ليش ما تجين NOT_APROP Yes -
User 2
فاتك نص عمرك جنى الحيوانة اليوم
NOT_APROP Yes -
جات
يا حيوانه NOT_APROP Yes -
السالم عليكم كيفك APROP Yes -
لو انك ولد امك و ابوك قابلني بعد
User 3 NOT_APROP Yes -
المدرسة وتشوف ايش حسوي فيك
وهللا اقتلك NOT_APROP Yes -
There are no good tools
for correcting
ياكلبة APROP No misspellings for Arabic
User 4 dialects and training data
was spelled correctly
وريني صدرك واعطيك اللي تبين NOT_APROP Yes -
روحي انتحري محد يبغاك يا منبوذة NOT_APROP Yes -
يا غبي NOT_APROP Yes -
ياحبيبي APROP Yes -
سمعت انك كلمت اخويا ع اللي صار
User 5 APROP Yes -
امس
لو سمحت ال تتدخل مرة ثانية وال قسم
NOT_APROP Yes -
باهلل اقتلك
125
اذبحك NOT_APROP Yes -
User 6 غبيه ومحد يبغاك يامنبوذة NOT_APROP Yes -
فسخي وصوري لي وحاجيب لك NOT_APROP Yes -
يا حقيره NOT_APROP Yes -
User 7 زباله انقلعي NOT_APROP Yes -
ال ترسلي شي APROP Yes -
كل شي سيء في الحياه منك APROP No Limited training data
User 8 هللا يلعنك NOT_APROP Yes -
شكلك المعفن ينحسنا NOT_APROP Yes -
User 9 حالة اللي يغتصبك ويقتلك بعدها يا سالم NOT_APROP Yes -
Our application was tested by 9 different parents from the age of 20 to 50+, the tests were
conducted in person, the application was downloaded to user device who were then asked to use it. After
the users were done using the application, a questionnaire was given to them to get their feedback.
see appendix A.
55.6% of users of our application were experts in technology, 33.3% considered their level in technology
to be medium while 11.1% considered themselves to be beginners,
The users were asked to use the | اتقاءEtiqa'a application without limitation, the parents set up their
accounts on their devices and their children devices with no problems, then the parents sent messages to
their child's WhatsApp to test the application, the results show that 25/28 (89.3%) of messages that were
sent by the users were classified correctly, while 3/28 (10.7%) messages were misclassified, this problem
was caused because of the following reasons
1- the model was trained on a small and limited dataset
2- the lack of tools that can correct Arabic dialect misspelled words.
100% of the users said they received a notification to alert them about the inappropriate messages.
88.8% of the parents found that the application was clear and easy to use, while 11.1% found it difficult to
understand some of the instructions and phrases in the application.
Most parents suggested for the model be taught more inappropriate words to improve the application
performance, and all of them confirmed that the | اتقاءEtiqa'a application is valuable and will be very
useful in protecting their children. Overall, all the users were pleased with the application and its services.
126
Chapter 9: CONCLUSION
9.1 Discussion and Conclusion
9.2 Challenges
9.3 Conclusion
127
Chapter 9: CONCLUSION
This chapter contains conclusion of the project, the challenges we faced throughout the making of
| اتقاءEtiqa’a application and what we plan to do with application in the future to improve it.
To build a machine learning model first we needed a dataset, we gathered 4 different datasets from
different social media platforms, then we combined them into one dataset, the content of the dataset was
labeled either as appropriate(APROP) or inappropriate (NOT_APROP), then the dataset was cleaned by
removing any duplicate row and any rows that had one or no characters , non-Arabic characters and
emojis were removed , extra spaces at begging and end of sentences were removed and multiple spaces at
middle of the sentences were replaced with one space only, we corrected misspelled words and mapped
extended Urdu and Persian letters into normal letters for example all ۈۇۆۅۄۋwere mapped to و, then the
data was normalized and tokenized using different stemmers and lemmatizes to find the best one that will
improve classification performance, the camel lemmatize and stemmer were them chosen, then data
features were extracted using 2 methods (Count Vectorizer and Frequency-Inverse Document Frequency
(TF-IDF)), the data was tested using 6 different algorithms, Complement Naïve Bayes classifier, Logistic
Regression, Support Vector Machine (SVM), k-Nearest Neighbors (KNN), Random forest and Decision
tree. Each algorithm was tested using the two different vectorizers in 4 different dataset conditions:
1- Dataset features extracted by lemmatizing text and keeping the stop-words.
2- Dataset features extracted by stemming text and keeping the stop-words.
3- Dataset features extracted by lemmatizing text and with stop-words being removed.
4- Dataset features extracted by stemming text and with stop-words being removed.
The results showed that the best Dataset features extractor was camel lemmatize, and it performed its best
with the stop-words not removed. And the algorithm that yielded the best results was Logistic Regression
with an accuracy of 81.2%, and F1 score of 81.5%.
128
Before creating the application, a database was created, the database is made of 4 tables, a table
for parents’ information, another for child’s information, a table to save messages in, and table for advice,
the database component is mentioned in details in section 7.3
As for the creation of the actual application, we built a system that would work differently based
on if the application is on parent’s device or child’s device which will be determined when the parent first
opens the application and chooses whether he/she is on child’s device or his/her device , if the application
is on parents device then it will allow the parent to perform these main functions ‘create account’, ‘login’,
and ‘add a child’, the parent can only create an account if he/she is using the application on his/her
device, if he/she is on child’s device he/she will not have a create account option, after the user has
confirmed his/her email, he/she can login and add a child, adding a child will require child’s name and
date of birth, after that the child will be gray, it will not be activated until the parent logs in to | اتقاءEtiqa'a
application from child’s device to give access permission.
The parent can login into the application using the account he/she created while using | اتقاء
Etiqa'a application on her//his device, when the parent logs in from child’s device, the inactive children
will be shown then the parent can choose which child the device belongs to, then the child becomes active
and access permission is asked by the application and when given by the parent, the | اتقاءEtiqa'a
application can start listening to notification by using flutter_notification_listener plugin in flutter
application , only WhatsApp messenger notification are filtered and sent to the ML model to be analyzed
and classified as appropriate or inappropriate, the ML model and application were connected using
Python API component which turned the model into an API using flask application and sending requests
to it via a flutter application, this component allows WhatsApp messages to be read from child’s device
and sends them to the model which then clean the message, perform ANLP and predicts whether it’s an
appropriate message or inappropriate, and if the ML model finds the messages to be inappropriate then it
stores it in the database and sends a notification via Firebase Cloud Messaging (FCM) to parents device to
inform him/her about the content of the message.
129
9.2 Challenges
During the development phase of this project, we encountered various challenges.
1- Lack of good tools that help dealing with Arabic dialect text.
2- Lack of time to prepare more data and enhance model accuracy in classifying the messages.
3- The scarcity of sources that clarify the packages used to listen to notifications.
4- Lack of time to try more ways to get WhatsApp messages and selecting the best one.
5- We were having a difficulty designing screens responsive for different screen sizes, and we tried
many ways and packages until we were able to solve this challenge.
6- We were facing a challenge in the appearance of unexplained errors among some team members
and not the other when sharing the codes until it became clear to us that it was because we used
different versions of the programs.
Every project has its ups and downs, but we persevered and worked as a wonderful team to
complete it as well as we could. We had some challenges, but they helped us to be more patient, and we
are pleased with the outcomes.
3- Make the system usable for all people including people with physical/visual handicaps!
4- Improve the model so that it can recognize all inappropriate/harmful words and sentences.
5- Make the classification of messages more accurate by adding more specified categories such as
suicidal, bullying, and sexual harassment.
6- Improve the message extraction method to overcome drawbacks such as not being able to see the
receivers (child’s) messages.
8- Improve our model to detect not only inappropriate messages but inappropriate photos and voice
messages as well.
130
REFERENCES
[1] H. Firmansyah and A. A. Azha, "Analysis of Sexual Predator Network News Framing in
Children," Journal of Humanities and Social Sciences Innovation, vol. 2, no. 4, 2022.
[3] O. Oueslati, E. Cambria, M. B. HajHmdia and H. Ounelli, "A review of sentiment analysis
research in Arabic language," Future Generation Computer Systems, vol. 112, pp. 408-430, 2020.
[4] S. Marie-Sainte, N. Alalyani, S. Alotaibi, S. Ghouzali and I. Abunadi, "Arabic Natural Language
Processing and Machine Learning-Based Systems," IEEE Access, vol. 7, pp. 7011 - 7020, 2018.
[6] ICDL Arabia, "Cyber Safety Report: Research into the online behaviour of Arab youth and the
risks they face," 2015. [Online]. Available: https://icdlarabia.org/downloads/y8auRNa9oj.
[Accessed 12 October 2022].
[8] B. M. Fahmy, "Cyberbullying among Adolescents on Social Media Networks," Egyptian Journal
of Public Opinion Research, vol. 20, no. 3, pp. 289 - 335, 2021.
[10] B. Bason, "An Open Letter About Why and How We Use AI at Bark," Bark, 21 April 2021.
[Online]. Available: https://www.bark.us/blog/open-letter-brian-bason/.
131
[11] B. Bason, "How Bark Works," Bark, 2022. [Online]. Available: https://www.bark.us/how/.
[12] J. G. a. J. G. Eduardo Cruz, "The all-in-one parental control and digital wellbeing solution,"
qustodio, 2022. [Online]. Available: https://www.qustodio.com/en/.
[18] M. Elarnaoty and A. Farghaly, "Machine Learning Implementations in Arabic Text Classification,"
in Intelligent Natural Language Processing: Trends and Applications, Springer, 2018, pp. 295-
324.
[19] D. Jurafsky and J. H. Martin, Speech and language processing: An introduction to natural language
processing, computational linguistics, and speech recognition, Upper Saddle River, NJ: Pearson
Education, 2009.
[20] D. Khurana, A. Koli, K. Khatter and S. Singh, Natural language processing: state of the art, current
trends and challenges, Multimed Tools Appl, 2022.
[21] S. Nguyen, "Multilingual NLP: Solutions to challenges," StageZero Technologies, 19 July 2022.
[Online]. Available: https://stagezero.ai/blog/multilingual-nlp-solutions/. [Accessed 10 October
2022].
[22] H. Al-Najjar, Some Aspects of Ambiguity in English and Arabic: A Comparative Study,
Department of English, Faculty of Arts, Ibb University, 2008.
132
[23] O. Obeid, N. Zalmout, S. Khalifa, D. Taji, M. Oudah, B. Alhafni, G. Inoue, F. Eryani, A. Erdmann
and N. Habash, "CAMeL tools: An open source python toolkit for Arabic natural language
processing.," in Proceedings of the 12th language resources and evaluation conference, 2020.
[24] A. Muaad, H. Davanagere, M. Al-antari, J. V. Benifa and C. Chola, "AI-based misogyny detection
from Arabic levantine twitter tweets," Computer Sciences & Mathematics Forum, p. 15, September
2021.
[27] A. Barbaresi, "Trafilatura: A Web Scraping Library and Command-Line Tool for Text Discovery
and Extraction," Proceedings of the 59th Annual Meeting of the Association for Computational
Linguistics and the 11th International Joint Conference on Natural Language Processing: System
Demonstrations, August 2021.
[31] H. Ameur, A. Rekik, S. Jamoussi and A. Ben Hamadou, "ChildProtect: A parental control
application for tracking hostile surfing content," Entertainment Computing, 2022.
[32] M. AlGhamdi and M. A. Khan, "Intelligent Analysis of Arabic Tweets for Detection of
Suspicious," Arabian Journal for Science and Engineering, p. 6021–6032, 10 March 2020.
133
[33] F. Kateb and J. Kalita, "Classifying Short Text in Social Media: Twitter as Case Study,"
International Journal of Computer Applications, pp. 1-12, February 2015.
[34] A. Farghaly and k. Shaalan , "Arabic Natural Language Processing: Challenges and Solutions,"
ACM Transactions on Asian Language Information Processing, p. 1–22, 01 December 2009.
[35] T. Alsubait and D. Alfageh, "Comparison of Machine Learning Techniques for Cyberbullying,"
IJCSNS International Journal of Computer Science and Network Security, January 2021.
[36] A. Dennis, B. H. Wixom and R. M. Roth, Systems Analysis and Design, 5th ed., John Wiley &
Sons, 2012.
[37] R. ALBayari and S. Abdallah, "Instagram-Based Benchmark Dataset for Cyberbullying Detection
in Arabic Text," MDPI, vol. 7, no. 7, 2022.
[39] S. Chowdhury, H. Mubarak, A. Abdelali, S.-g. Jung, B. Jansen and J. Salminen, "A Multi-Platform
Arabic News Comment Dataset for Offensive Language Detection," in LREC - Language
Resources and Evaluation Conference, Marseille, France, 2020.
[40] H. Mubarak, A. Rashed, K. Darwish, Y. Samih and A. Abdelali, "Arabic Offensive Language on
Twitter: Analysis and Experiments," Qatar Computing Research Institut, pp. 126-135, 2021.
[42] L. Cianci, "Best IDEs for Flutter in 2022," 21 February 2022. [Online]. Available:
https://blog.logrocket.com/best-ides-flutter-2022/.
[43] K. Sherrer, "Google Colab vs Jupyter Notebook: Compare data science software," 25 May 2022.
[Online]. Available: https://www.techrepublic.com/article/google-colab-vs-jupyter-notebook/.
[45] O. Obeid, N. Zalmout, S. Khalifa, D. Taji, M. Oudah, B. Alhafni, G. Inoue, F. Eryani, A. Erdmann
and N. Habash, "CAMeL Tools: An Open Source Python Toolkit for Arabic Natural Language
134
Processing," in Proceedings of the Twelfth Language Resources and Evaluation Conference,
European Language Resources Association, 2020, pp. 7022-7032.
[47] M. Kofler, The Definitive Guide to MySQL5, vol. phpMyAdmin , Berkeley, CA: Apress, (2005),
p. pp 87–116.
[49] M. O. Hegazi, Y. Al-Dossari, A. Al-Yahy and A. Al-Sumari, "Preprocessing Arabic text on social
media," Heliyon, 2021.
[51] M. O. Alhawarat , H. Abdeljaber and A. Hilal, "Effect of stemming on text similarity for Arabic
language at sentence level," PeerJ Comput Sci, 2021.
[52] A. Alharbi, "Multi dialect Arabic stop words," 2021 Feb 22. [Online].
[53] I. Aljarah, M. Habib and N. Hijazi, "Intelligent detection of hate speech in Arabic social network:
A machine learning approach," Journal of Information Science, pp. 1-19, 2020.
[54] D. Gamel, M. Alfonse, E.-S. El-Horbaty and A.-B. Salem, "Implementation of machine learning
algorithms in Arabic sentiment analysis using n-gram features," in Procedia Computer Science ,
2019.
[55] S. Abro, S. Shaikh and Z. Khand, "Automatic Hate Speech Detection using Machine Learning: A
Comparative Study".(IJACSA) International Journal of Advanced Computer Science and
Applications.
[56] M. Maghfour and A. Elouardighi, Standard and Dialectal Arabic Text Classification for Sentiment
Analysis, Morocco, 2018.
135
[57] D. Berrar, "Cross-validation," Tokyo Institute of Technology, Tokyo 152-8550, Japan, 2019.
[58] G. Handelman1, H. Kuan Kok, R. Chandra, A. Razavi7, S. Huang, M. Brooks, M. Lee and H.
Asadi, "Peering into the black box of artificial intelligence: evaluation metrics of machine learning
methods," American Journal of Roentgenology, pp. 38-43, 2019.
[59] Dalianis and Hercules, "Evaluation metrics and evaluation," in Clinical text mining, 2018, pp. 45-
53.
[60] "Saving a machine learning Model," geeksforgeeks, 11 January 2023. [Online]. Available:
https://www.geeksforgeeks.org/saving-a-machine-learning-model/. [Accessed 26 January 2023].
[62] M. Achour, F. Betz, A. Dovgal, H. Magnusson and G. Richter, "PHP Manual," php, 09 February
2023. [Online]. Available: https://www.php.net/manual/en/intro.pdo.php. [Accessed 09 February
2023].
[63] "ECMA-404 The JSON Data Interchange Standard.," Ecma International , December 2017.
[Online]. Available: https://www.json.org/json-en.html.
[65] "Flutter Plugin or Dart Package?," medium, 16 August 2017. [Online]. Available:
https://medium.com/@mehmetf_71205/flutter-plugin-or-dart-library-246c68df15f. [Accessed 26
January 2023].
[68] "What is TensorFlow? The machine learning library explained," [Online]. Available:
https://www.infoworld.com/article/3278008/what-is-tensorflow-the-machine-learning-library-
136
explained.html. [Accessed 25 October 2022].
[69] "Firebase Realtime Database Store and sync data in real time," [Online]. Available:
https://firebase.google.com/products/realtime-database. [Accessed 25 October 2022].
[70] I. Vyas, "Advantages of Firebase Mobile App Development," 29 March 2022. [Online]. Available:
https://citrusbug.com/blog/advantages-of-firebase-mobile-app-development. [Accessed 25 October
2022].
[71] [Online].
137
Appendix A
Questionnaire 1:
This questionnaire was published on social media. About 162 answered. The goal was to ask the opinion
of the target group about the idea of the application, how much they need it and whether they are using
similar existing applications, and it was used to gather requirements in chapter 3 to know what are the
most important features that need to include in our application to be distinctive and achieve the reaching
the intended goal.
138
Then Likert scale was used from 1 to 5 to determine their opinion of the application idea, corresponding
to (1) strongly disagree, (2) disagree, (3) neutral, (4) agree, and (5) strongly agree. We found 76.5% of
people strongly agree with it.
Then they were asked if they used a similar current system and what was its name, 91.9% of people
said they never used a similar application and 0.7% said they used an applicaton called “family link”.
139
Finally, they were asked about the features they would like to have in the application, and they gave us
many opinions and suggestions such as: the application interfaces should be clear and easy to use, hide
the application from the child's device, the accuracy and speed of sending alerts to parents.
Questionnaire 2:
This questionnaire is directed to Arab parents, and it was published on social media. About 280 answered.
defining the target age group more precisely and to see if parents can browse their children's The goal was
devices and to ask them if they will use the application and their opinion about whether it is ethical or not.
140
In the beginning, the parent will be asked whether he has children or not, to direct him to the
appropriate section.
84.3% the parents who answered the questionnaire have children, so they are directed to Section A.
141
63.6% of children use the WhatsApp application from their own devices.
63.1% of parents have the ability to browse the contents of their child's device, such as private
messages, and so on.
142
34.3% of parents consider it ethical to browse their child's device without his knowledge in order to
ensure his safety, while 22.9% believe that it is not ethical and they do not browse their children's
devices, while 42.8% allow their parents to browse their devices
The idea of the application was presented, and parents were asked if they would use it or not, and
89.4% of parents answered yes.
143
As for the 15.7% of parents who do not yet have children, these are the most important results that we
obtained:
54.5% of parents consider it ethical to browse their children’s device by parents without their
knowledge in order to preserve their safety, while 9.1% believe that it is immoral and parents should
not browse their children’s devices, while 36.4% believe that the child should know that the parent is
browsing his device.
The idea of the application was presented, and parents were asked whether they would advise
someone to use it or not, and 88.6% of parents answered yes.
144
Questionnaire 3:
This questionnaire is to test the | اتقاءEtiqa'a application by end users to find out its efficiency and
effectiveness by providing feedback on the application to help it improve.
145
146
147
148
149
Appendix B
Gantt chart for project management plan(first semester)
150