Jebin 2

Effective Spam and Privacy Violation Detection using NLP-RF
Approach
Jebin Earnest Dr.Ms.Jenitha Angu Kaushik S

Department of IT Department of IT Department of IT
Sathyabama Institute of Science and Sathyabama Institute of Science and
Sathyabama Institute of Science and
Technology Technoloogy
Technology
Chennai,India Chennai,India
Chennai,India
Merlinmaryjenitha.it@gmail.com Sangukaushik003@gmial.com
Jebinearnest23@gmail.com
ABSTRACT purposes, including financial disruption and

maligning in both personal and official life.
However, traditional show methodologies
The use of unsolicited digital communication, used for email detection face challenges in
such as email spam, poses a significant threat detecting "zero-day" assaults, leading in
in the online world. For instance, spammers increased false positive rates (FPR) and
may gather emails that are used for online reduced accuracy of detection. Because they
enrolment and subject innocent people to a contain all spam traits at the processing stage,
variety of assaults. Another method of ANN-based spam identification may be prone
spamming is to create temporary email to errors in their outputs. Furthermore, DNS-
accounts that may be terminated after a set based spamming botnet detection algorithms
amount of time. This enables spammers to do not employ DNS capabilities for
misuse these temporary emails without recognizing spam sending bots. To address
disclosing their real account details, thereby these challenges, several approaches have
resulting in significant problems such as theft been introduced, including spam filtering
of user credentials and storage shortage to tools and DNS-based methods. However,
overcome this issue, an efficient detection these methods have limited impact due to the
technique based on emphasis extraction and longer time required for spam email
categorization must be implemented to detect recognition, higher memory usage, and
spam and transient email addresses. This is detection errors at the output. Therefore, a
possible using a Natural Language Processing novel approach that includes choice and
(NLP)-related technique. The suggested classification has been introduced to detect
method minimises the incidence of spam social spammers who use various strategies to
emails and increases spam email filtering maximize their account data, including using
accuracy. The NLP strategy allows the existing clients' account data and mimicking
system to recognise natural languages spoken them as genuine users. These spammers
by individuals, while the Machine Learning distribute malicious programs to reach their
filters spams using several decision trees and objectives.
a vague hub. On a daily basis, email providers are the most
often utilized application provider for many
customers. Emails are used for
INTRODUCTION communication, as well as for transmitting
and receiving messages in text format and
Currently, spamming is spreading rapidly media. A lot of customers utilize email
through various advanced communication accounts for daily chores such as digital
channels, with email being the most widely banking, settling bills, and shopping on
used medium. Spammers propagate spam websites such as Amazon. Certain junk mail
emails for promotional and malicious messages are sent only for the sake of
amusement., while others distribute malware messages are essential to ensure the security
such as trojans and viruses. Email and reliability of email communication.
applications are also used for promoting new
entries in shopping locations to grab the .
curiosity of web users. Despite being the LITERATURE SURVEY
fastest, most convenient, and cheapest means
of communication, the distribution of spam
emails is a major bottleneck for email This section provides an overview of the
utilization. Several text classification ways linked works of email spam detection and
have been developed, including the use of mail information optimisation. A synopsis of
Support Vector Machines (SVM) to partition several tactics has also been explored.
emails into quality space results as genuine
matter.
In this paper, [1] proposed the Opinion Rank
Spam vs Ham calculation for computing the reliability of
every available site and recognizing the
Spam emails are a major problem for reliable ones with high believe values. This
individuals and organizations alike. calculation is based on a breadth-first search
Spammers use these emails to spread calculation that begins from an existing set of
malware, steal personal and financial trustworthy websites. They utilized other
information, and engage in other malicious algorithms such as Tall PageRank and
activities. They also waste valuable resources Converse PageRank to rank the websites
such as bandwidth and storage space, and can based on their reliability. In [2], the authors
be a nuisance for users who have to sift proposed the applications of machine
through countless spam messages. In learning-based spam location for accurate
contrast, ham refers to legitimate and desired location of spams. They used the Calculated
messages that are not spam. The use of email Relapse, K-nearest neighbor, and Detection
as a means of communication has increased Tree algorithms for classification of spam and
exponentially over the years, which has led to ham messages in portable gadget
a corresponding increase in the amount of communications. In [3], the authors reviewed
spam. In fact, over 70% of commercial popular machine learning strategies for their
emails are spam emails, according to a 2009 potential in classifying spam emails, such as
report. This makes it necessary for Bayesian classification, K-nearest neighbour
individuals and organizations to have classifier strategy, artificial neural network
effective methods of filtering out spam classifier method, Support Vector Machine
emails. Developing anti-spam programs can classifier strategy, Artificial Immune System
be a challenging task, as spammers constantly classifier strategy, and Rough Sets classifier
change their tactics to avoid detection. method. Lastly, in [4], the authors proposed a
Machine learning algorithms have been used machine learning model based on a crossover
in recent years to help classify emails as spam sacking approach and implemented it using
or ham. A quasi-supervised strategy in which two machine algorithms, Credulous Bayes
a small quantity of tagged data is combined algorithm and J48 (Choice tree) algorithm,
with an increased volume of untagged input., for detecting spam emails. They performed
has been found to be effective in improving three experiments, and the J48 algorithm
the accuracy of spam email filtering gave the best results.
programs. Overall, spam emails are a serious
problem that can have significant negative The papers [1] and [2] focused on detecting
impacts on individuals and organizations. reliable websites and locating spams using
Effective methods of filtering out these machine learning-based techniques. On the
other hand, paper [3] reviewed several
popular machine learning strategies for spam for this research involves utilizing ML
classification. Lastly, paper [4] proposed a models and NLP techniques are required to
machine learning model for detecting spam make a system for email spam detection.
emails using a hybrid approach. Overall, the Finally, the study establishes classification
literature survey provides insights into parameters and looks at the outcomes'
various techniques for detecting spam emails accuracy and loss %, which are subsequently
and reliable websites. recorded and published.
Research and Methodology
The initial step in research involves

identifying the problem statement, which in
this case pertains to the challenge of detecting
spam emails efficiently. To protect people
from fraudulent emails that attempt to steal Figure 1 Architecture Design
their rights of access from their online

account data. It is critical to develop a model
for identifying spam from valid messages.
After identifying the research requirement,
the study proceeds to formulating research
questions that evaluate the effectiveness, This section describes the structural design of
feasibility, and performance of current a system that employs ML models and NLP
approaches and data extraction techniques are techniques to detect spam emails.
employed to retrieve and synthesize the
relevant information. The architecture design Algorithm:
The paper proposes a novel approach for

detecting email spam using the Random
Forest algorithm and Natural Language
Processing techniques. The proposed
approach involves the pre-processing of
email messages to remove stop words,
stemming and feature extraction. After that,
the Random Forest algorithm is trained on the
extracted features to classify email messages
as spam or non-spam.
It is an ensemble learning algorithm that uses

decision trees for classification, where each
tree is trained on a random subset of the data.
During the classification phase, the data is
passed through each tree in the forest, and the
majority vote of the trees determines the final
classification. In this paper, the algorithm is
trained on the features extracted from email
messages using NLP and n-grams. The
approach was evaluated using the Enron-
Spam dataset and achieved an accuracy of and lead to inaccurate results. NLP employs
97.9%. The proposed approach provides a syntactic techniques, including
robust and efficient solution for detecting lemmatization, word segmentation, speech
email spam using NLP techniques and tagging, sentence breaking, and stemming to
Random Forest algorithm. The proposed extract meaningful data from each sentence.
system begins by taking a csv file as input, These methods aim to condense multiple
which contains a dataset consisting of both versions of a word, segment larger text into
spam and ham emails. The data is split into smaller components, identify the parts of
training and testing sets and used for data pre- speech for each word, divide large text into
processing and cleaning. The data is then sentences, and reduce modified words to their
tokenized and lemmatized as the first step in simplest form.
the data preparation process. The extracted
website links are then evaluated for accuracy
in the Precision Rank model. This approach
aims to classify the message as either 'Ham'
or 'Spam.' The message will undergo a test, A Comprehensive Collection of
classification, and precision model where the Labeled Emails for Supervised
document matrix will save the relative count
of terms in the phrase. The punctuation Learning
percentage will then be calculated in relation
to the message length, and the results will be The proposed model uses supervised learning
graded as good or bad to determine if the
message is spam or ham. The proposed
system for detecting email spam starts with
importing necessary libraries for executing
the algorithm, followed by loading the spam
dataset. Pre-processing of data is crucial
before building a model, which involves
applying the NLP algorithm to transform text
data. The next step is extraction, where
meaningful features are generated in data
using domain expertise and ML technique.
The unimportant features generated are then
eliminated to improve categorization. Once
the dataset is ready, it is divided into testing Figure 2 Datase
and training datasets in a predetermined ratio. and requires a comprehensive dataset for
The train dataset is used to train the classifier training. In order to obtain this dataset, user
and develop a model, whilst the test data set communication data is collected and labeled
is used to evaluate the classification model. as spam or ham. The dataset used for this
The accuracy rate is then computed, and the study contains around 5000 email records and
system is modified to improve accuracy. is shown in a screenshot in the paper. Each
email in the dataset is labeled as either real
In the realm of Artificial Intelligence, Natural (ham) or spam. The dataset comprises of
Language Processing (NLP) is concerned 5,574 rows and two columns.
with human-computer communication.
Machine Learning (ML) is commonly used in
NLP to decode the meaning of natural
language. Algorithms are employed to find
and extract natural language rules, making
unstructured data computationally accessible.
However, certain text data may be unclear
Figure 3 Dataset 2
Analysis of Length Characteristics commonly used words in natural language
in Spam and Ham Email Data Set: and do not contribute much to the meaning
of the text, will also be eliminated during
The data collection is made up of various data preprocessing. This has the potential
lengths of spam and ham. It should be to increase the accuracy of data. and make
highlighted that data length can be it easier for the model to recognize
displayed over tags to analyses its features. patterns and make accurate predictions.
The preceding graphic shows that the total However, data cleaning in NLP is
amount of spammer data is greater challenging because sentence is a complex
compared to the amount of Ham data. It and flexible medium of communication
can be concluded that the spam data will that does not have the formal structure
contain larger data. required for straightforward processing.
Therefore, it is necessary to modify and
clean the language to ensure that the
Figure 4 Spam vs ham

computer can interpret it in the correct
Data Preparation Techniques for manner. This involves converting natural
NLP based Spam Detection Model language phrases and paragraphs into
structured data that the machine learning
In the field of NLP, data preparation is a can work with. Overall, data preparation is
crucial stage that involves processing the a critical step in NLP, and it is essential to
data before feeding it into a ML model. ensure that the data is correctly processed
The model's success is strongly dependent and cleaned before feeding it into the
on the reliability and layout of the data machine learning model to achieve
presented to it. Therefore, to improve the accurate and reliable predictions.
accuracy of the predictions made by the
model, data preprocessing is a necessary
step. Additionally, stop words, which are
RESULT:
REFERENCE:
One of the major problems with modern
communication systems' internet
infrastructure is spam emails. Spammers ] Priyanka Verma,
utilize temporary email accounts to send
spam emails, which negatively impacts Anjali Goyal and
users and participating organizations. They
do this by abusing the tools available for Yogita Gigras,
online communication. The proposed
method of NLP-RF aims to improve the “Email phishing:
identification of Unwanted messages and
phantom email addresses that can Text classification
compromise users' privacy. The method
uses machine learning algorithms to using natural
process and analyze text data from emails,
and employs the random forest approach to language
classify incoming emails as either spam or
legitimate messages. processing.”,
This approach has several benefits,
including its ability is quickly and Computer
accurately identify spam emails, which can
save users time and reduce the risk of Science and
clicking on malicious links or
downloading harmful attachments. Information
Additionally, the method can also help
prevent the exposure of personal Technologies, Vol. 1,
information by identifying temporary
email addresses, which are often used for No. 1, May 2020,
fraudulent activities or spamming.
Overall, the NLP-RF approach provides an pp.
effective solution to the problem of email
spam and privacy violations, and can be 1~12, ISSN: 2722-
implemented in various email clients and
platforms to enhance user protection and 3221, DOI:
security. By enhancing the dataset for
stronger features and classification, the 10.11591/csit.v1i1.p
work is susceptible to gain greater
development and accuracy in the future. 1-12.
This approach may improve the privacy of
email senders and receivers and minimize [2] Nour El-
security threats.
Mawass, Paul
Honeinea, Laurent
Vercouterb,“SimilCa
tch:
Enhanced social Nishant, “Email
spammers detection Spam Detection
on Twitter using Using Machine
Markov Learning
Random Fields.”, Algorithms.”,
Information Proceedings of the
Processing and Second
Management 57 International
(2020) Conference on
102317. Inventive Research
[3] Dongjie Liu in Computing
and Jong-Hyouk Applications
Lee, “CNN based (ICIRCA-2020),
Malicious Website IEEE Xplore Part
Detection by Number:
Invalidating CFP20N67-ART;
Multiple Web ISBN: 978-1-7281-
Spams.”,Information 5374-2.
Technology [5] Zhiwei Guo,
Research Center, Yu Shen, Ali
10.1109/ACCESS.20 Kashif Bashir,
20.2995157, IEEE Muhammad
Access. Imran,Neeraj
[4] Nikhil Kumar, Kumar, Di
Sanket Sonowal, Zhang,Keping Yu,
“Robust Spammer [7] Rajavardhan
Detection Using Reddy Marikanti,
Collaborative Neural Katkoori Shiva
Network in Internet Prasad, Hannoop
of Thing Kumar Suddala, K.
Applications.”, Bala Thripura
University of Sundari, “Detection
Glasgow, of Phishing
10.1109/JIOT.2020.3 Attacks using
003802, IEEE Natural Language
Internet Processing and
ofThings Journal. Logistic Regression
[6] Malika Ben Model.”, UGC Care
Khalifa, Zied Group I Listed
Elouedi, Eric Journal, ISSN:
Lefevre,“An 2278-4632, Vol-
Evidential 10 Issue-6 No. 1
Spammer Detection June 2020.
based on the ] Priyanka Verma,
Suspicious Anjali Goyal and
Behaviors’ Yogita Gigras,
Indicators.”, “Email phishing:
Auckland University Text classification
of Technology. using natural
August 11,2020. language
processing.”, 102317.
Computer [3] Dongjie Liu
Science and and Jong-Hyouk
Information Lee, “CNN based
Technologies, Vol. 1, Malicious Website
No. 1, May 2020, Detection by
pp. Invalidating
1~12, ISSN: 2722- Multiple Web
3221, DOI: Spams.”,Information
10.11591/csit.v1i1.p Technology
1-12. Research Center,
[2] Nour El- 10.1109/ACCESS.20
Mawass, Paul 20.2995157, IEEE
Honeinea, Laurent Access.
Vercouterb,“SimilCa [4] Nikhil Kumar,
tch: Sanket Sonowal,
Markov Learning
Inventive Research 003802, IEEE
in Computing Internet
Applications ofThings Journal.
(ICIRCA-2020), [6] Malika Ben
IEEE Xplore Part Khalifa, Zied
Number: Elouedi, Eric
CFP20N67-ART; Lefevre,“An
ISBN: 978-1-7281- Evidential
5374-2. Spammer Detection
[5] Zhiwei Guo, based on the
Yu Shen, Ali Suspicious
Kashif Bashir, Behaviors’
Muhammad Indicators.”,
Imran,Neeraj Auckland University
Kumar, Di of Technology.
Zhang,Keping Yu, August 11,2020.
“Robust Spammer [7] Rajavardhan
Detection Using Reddy Marikanti,
Collaborative Neural Katkoori Shiva
Network in Internet Prasad, Hannoop
of Thing Kumar Suddala, K.
Applications.”, Bala Thripura
University of Sundari, “Detection
Glasgow, of Phishing
10.1109/JIOT.2020.3
Attacks using 1~12, ISSN: 2722-
Natural Language 3221, DOI:
Processing and 10.11591/csit.v1i1.p
Logistic Regression 1-12.
Model.”, UGC Care [2] Nour El-
Group I Listed Mawass, Paul
Journal, ISSN: Honeinea, Laurent
2278-4632, Vol- Vercouterb,“SimilCa
10 Issue-6 No. 1 tch:
June 2020. Enhanced social
] Priyanka Verma, spammers detection
Anjali Goyal and on Twitter using
Yogita Gigras, Markov
“Email phishing: Random Fields.”,
Text classification Information
using natural Processing and
language Management 57
processing.”, (2020)
Computer 102317.
Science and [3] Dongjie Liu
Information and Jong-Hyouk
Technologies, Vol. 1, Lee, “CNN based
No. 1, May 2020, Malicious Website
pp. Detection by
Invalidating
Multiple Web CFP20N67-ART;
Spams.”,Information ISBN: 978-1-7281-
Technology 5374-2.
Research Center, [5] Zhiwei Guo,
10.1109/ACCESS.20 Yu Shen, Ali
20.2995157, IEEE Kashif Bashir,
Access. Muhammad
[4] Nikhil Kumar, Imran,Neeraj
Sanket Sonowal, Kumar, Di
Nishant, “Email Zhang,Keping Yu,
Spam Detection “Robust Spammer
Using Machine Detection Using
Learning Collaborative Neural
Algorithms.”, Network in Internet
Proceedings of the of Thing
Second Applications.”,
International University of
Conference on Glasgow,
Inventive Research 10.1109/JIOT.2020.3
in Computing 003802, IEEE
Applications Internet
(ICIRCA-2020), ofThings Journal.
IEEE Xplore Part [6] Malika Ben
Number: Khalifa, Zied
Elouedi, Eric
Lefevre,“An Journal, ISSN:
Evidential 2278-4632, Vol-
Spammer Detection 10 Issue-6 No. 1
based on the June 2020.
Suspicious ] Priyanka Verma,
Behaviors’ Anjali Goyal and
Indicators.”, Yogita Gigras,
Auckland University “Email phishing:
of Technology. Text classification
August 11,2020. using natural
[7] Rajavardhan language
Reddy Marikanti, processing.”,
Katkoori Shiva Computer
Prasad, Hannoop Science and
Kumar Suddala, K. Information
Bala Thripura Technologies, Vol. 1,
Sundari, “Detection No. 1, May 2020,
of Phishing pp.
Attacks using 1~12, ISSN: 2722-
Natural Language 3221, DOI:
Processing and 10.11591/csit.v1i1.p
Logistic Regression 1-12.
Model.”, UGC Care [2] Nour El-
Group I Listed Mawass, Paul
Honeinea, Laurent
Markov Learning
102317. Inventive Research
[3] Dongjie Liu in Computing
and Jong-Hyouk Applications
Lee, “CNN based (ICIRCA-2020),
Malicious Website IEEE Xplore Part
Detection by Number:
Invalidating CFP20N67-ART;
Multiple Web ISBN: 978-1-7281-
Spams.”,Information 5374-2.
Technology [5] Zhiwei Guo,
Research Center, Yu Shen, Ali
10.1109/ACCESS.20 Kashif Bashir,
20.2995157, IEEE Muhammad
Access. Imran,Neeraj
Kumar, Di Auckland University
Zhang,Keping Yu, of Technology.
“Robust Spammer August 11,2020.
Detection Using [7] Rajavardhan
Collaborative Neural Reddy Marikanti,
Network in Internet Katkoori Shiva
of Thing Prasad, Hannoop
Applications.”, Kumar Suddala, K.
University of Bala Thripura
Glasgow, Sundari, “Detection
10.1109/JIOT.2020.3 of Phishing
003802, IEEE Attacks using
Internet Natural Language
ofThings Journal. Processing and
[6] Malika Ben Logistic Regression
Khalifa, Zied Model.”, UGC Care
Elouedi, Eric Group I Listed
Suspicious ] Priyanka Verma,
Behaviors’ Anjali Goyal and
Indicators.”, Yogita Gigras,
“Email phishing:
Text classification Processing and
using natural Management 57
language (2020)
Information Lee, “CNN based
Technologies, Vol. 1, Malicious Website
No. 1, May 2020, Detection by
pp. Invalidating
1~12, ISSN: 2722- Multiple Web
3221, DOI: Spams.”,Information
10.11591/csit.v1i1.p Technology
1-12. Research Center,
[2] Nour El- 10.1109/ACCESS.20
Mawass, Paul 20.2995157, IEEE
Honeinea, Laurent Access.
Markov Learning
Information
Applications Internet
(ICIRCA-2020), ofThings Journal.
IEEE Xplore Part [6] Malika Ben
Number: Khalifa, Zied
CFP20N67-ART; Elouedi, Eric
ISBN: 978-1-7281- Lefevre,“An
5374-2. Evidential
[5] Zhiwei Guo, Spammer Detection
Yu Shen, Ali based on the
Kashif Bashir, Suspicious
Muhammad Behaviors’
Imran,Neeraj Indicators.”,
Kumar, Di Auckland University
Zhang,Keping Yu, of Technology.
“Robust Spammer August 11,2020.
Detection Using [7] Rajavardhan
Collaborative Neural Reddy Marikanti,
Network in Internet Katkoori Shiva
Prasad, Hannoop
Kumar Suddala, K. Technologies, Vol. 1,
Bala Thripura No. 1, May 2020,
Sundari, “Detection pp.
of Phishing 1~12, ISSN: 2722-
Attacks using 3221, DOI:
Natural Language 10.11591/csit.v1i1.p
Processing and 1-12.
Logistic Regression [2] Nour El-
Model.”, UGC Care Mawass, Paul
Group I Listed Honeinea, Laurent
Journal, ISSN: Vercouterb,“SimilCa
2278-4632, Vol- tch:
10 Issue-6 No. 1 Enhanced social
June 2020. spammers detection
] Priyanka Verma, on Twitter using
Anjali Goyal and Markov
Yogita Gigras, Random Fields.”,
“Email phishing: Information
Text classification Processing and
using natural Management 57
language (2020)
Information
Lee, “CNN based Applications
Malicious Website (ICIRCA-2020),
Detection by IEEE Xplore Part
Invalidating Number:
Multiple Web CFP20N67-ART;
Spams.”,Information ISBN: 978-1-7281-
Technology 5374-2.
Research Center, [5] Zhiwei Guo,
10.1109/ACCESS.20 Yu Shen, Ali
20.2995157, IEEE Kashif Bashir,
Access. Muhammad
[4] Nikhil Kumar, Imran,Neeraj
Sanket Sonowal, Kumar, Di
Nishant, “Email Zhang,Keping Yu,
Spam Detection “Robust Spammer
Using Machine Detection Using
Learning Collaborative Neural
Algorithms.”, Network in Internet
Internet
ofThings Journal. Processing and
[6] Malika Ben Logistic Regression
Khalifa, Zied Model.”, UGC Care
Elouedi, Eric Group I Listed
Suspicious
[1] Priyanka Verma, Anjali Goyal and
Yogita Gigras, “Email phishing:
Text classification using natural language
Behaviors’ processing.”, Computer
Science and Information Technologies, Vol. 1,
Indicators.”, No. 1, May 2020, pp.
1~12, ISSN: 2722-3221, DOI:
Auckland University 10.11591/csit.v1i1.p1-12.
[2] Nour El-Mawass, Paul Honeinea,
of Technology. Laurent Vercouterb,“SimilCatch:
Enhanced social spammers detection on
August 11,2020. Twitter using Markov

Random Fields.”, Information Processing and
Management 57 (2020)
[7] Rajavardhan 102317.
[3] Dongjie Liu and Jong-Hyouk Lee,
Reddy Marikanti, “CNN based Malicious Website
Detection by Invalidating Multiple Web
Katkoori Shiva Spams.”,Information
Technology Research Center,
Prasad, Hannoop 10.1109/ACCESS.2020.2995157, IEEE
Access.
Kumar Suddala, K. [4] Nikhil Kumar, Sanket Sonowal, Nishant,

“Email Spam Detection
Bala Thripura
Using Machine Learning Algorithms.”,
Proceedings of the Second
International Conference on Inventive
Sundari, “Detection Research in Computing
Applications (ICIRCA-2020), IEEE Xplore
of Phishing Part Number:
CFP20N67-ART; ISBN: 978-1-7281-5374-2.
Attacks using [5] Zhiwei Guo, Yu Shen, Ali Kashif
Bashir, Muhammad Imran,Neeraj
Natural Language Kumar, Di Zhang,Keping Yu, “Robust
Spammer Detection Using
Collaborative Neural Network in Internet of
Thing Applications.”,
University of Glasgow, [7] Rajavardhan Reddy Marikanti, Katkoori
10.1109/JIOT.2020.3003802, IEEE Internet Shiva Prasad, Hannoop
ofThings Journal. Kumar Suddala, K. Bala Thripura Sundari,
[6] Malika Ben Khalifa, Zied Elouedi, Eric “Detection of Phishing
Lefevre,“An Evidential Attacks using Natural Language Processing
Spammer Detection based on the Suspicious and Logistic Regression
Behaviors’ Indicators.”, Model.”, UGC Care Group I Listed Journal,
Auckland University of Technology. August ISSN: 2278-4632, Vol-
11,2020. 10 Issue-6 No. 1 June 2020.

Jebin 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Jebin 2

Uploaded by

Copyright:

Available Formats

Effective Spam and Privacy Violation Detection using NLP-RF

Jebin Earnest Dr.Ms.Jenitha Angu Kaushik S

ABSTRACT purposes, including financial disruption and

Research and Methodology

The initial step in research involves

their rights of access from their online

The paper proposes a novel approach for

It is an ensemble learning algorithm that uses

Figure 4 Spam vs ham

August 11,2020. Twitter using Markov

Kumar Suddala, K. [4] Nikhil Kumar, Sanket Sonowal, Nishant,

You might also like