2023 Ijsem-147259

Enhanced accuracy of detecting fraudulent product reviews
using a fusion machine learning approach

Abstract
Spam product reviews are fake or misleading reviews that are often posted with the intention of
promoting a product or service, or of damaging the reputation of a competitor. They can be difficult
to detect, as they are often written to appear legitimate and may include fake profiles or misleading
information. Spam product reviews can be harmful to businesses and consumers. For businesses,
they can distort ratings and reviews and mislead consumers about the quality of a product or
service, damaging the credibility and trustworthiness of the business. For consumers, they can lead
to confusion and frustration when they purchase a product based on fake reviews, and can also
harm their trust in the review process. It's important for businesses and review platforms to have
measures in place to identify and eliminate spam product reviews. This can include using keyword
filtering, checking for suspicious patterns in the reviews, using machine learning techniques, and
checking the reputation of the reviewer. It's also important to continually monitor and update these
measures to ensure that they are effective at detecting spam reviews. Here we proposed a hybrid
technique for identifying fake spam product review. This paper first introduces the task of spam
online product reviews detection and makes a common definition of spam reviews. Then, we
comprehensively conclude the existing method on publicly available datasets. Finally, we have
shown the performance comparison for traditional machine learning, deep learning and proposed
hybrid classifier. Additionally, a literature analysis, discussion, and future directions were also
presented.
KEYWORDS:
Fake reviews detection; methodologies; machine learning; Web- Scrapping; Information retrieval.
1. INTRODUCTION
The rise of digital technology has made it easy for people to purchase products online. With the
help of Web 2.0, users can now share their experiences and opinions about their purchases. These
are very important reviews as they can help the customers make informed decisions and provide
feedback to the organizations that are trying to sell their goods. Due to the abundance of reviews,
it has been difficult for organizations to identify the required information and the overall sentiment
of the consumers regarding the products. Through opinion mining, they can analyze the data and
find out if there are any fake or irrelevant reviews[1].
A fake product review is a review that is not genuine and is written with the intention of promoting
a product or service, or of damaging the reputation of a competitor. Fake reviews can be difficult
to detect, as they are often written to appear legitimate and may include fake profiles or misleading
information. Fake product reviews can be harmful to businesses and consumers. For businesses,
they can distort ratings and reviews and mislead consumers about the quality of a product or
service, damaging the credibility and trustworthiness of the business. For consumers, they can lead
to confusion and frustration when they purchase a product based on fake reviews, and can also
harm their trust in the review process[2].
It's important for businesses and review platforms to have measures in place to identify and
eliminate fake product reviews. This can include using keyword filtering, checking for suspicious
patterns in the reviews, using machine learning techniques, and checking the reputation of the
reviewer. It's also important to continually monitor and update these measures to ensure that they
are effective at detecting fake reviews.
Spam review identification is important for businesses and consumers as it can prevent misleading
information and protect both consumers and businesses from false or negative reviews. There are
several ways to identify fake reviews, such as keyword filtering, checking for suspicious patterns,
using machine learning, checking the reputation of the reviewer and using CAPTCHAs or other
verification methods. Implementing multiple approaches and continually monitoring and updating
methods can ensure that fake reviews are effectively identified and eliminated. However, spam
review identification also has its drawbacks, such as the risk of identifying legitimate reviews as
spam, the resource-intensive process of review and classification and the difficulty in identifying
spam reviews accurately which can lead to a high number of false positives or missed spam
reviews.[3]-[5]
1.2 Current Issues in spam review identification using ML and DL
There are several current issues in spam review identification using machine learning and deep
learning techniques [3]–[5]. Machine learning algorithms can face several challenges when applied
to spam review identification, including data imbalance, lack of diverse data, lack of explain
ability, adversarial attacks, and evolving spam techniques. Data imbalance occurs when spam
reviews make up a small minority of the overall dataset, which can affect the performance of
machine learning algorithms. A lack of diverse data can make it difficult for the algorithm to
generalize well to new, unseen data. Some machine learning models are not easily interpretable,
making it difficult to understand how the model is making decisions and to identify potential
biases. Machine learning models can also be vulnerable to adversarial attacks, in which an attacker
manipulates the input data to cause the model to make incorrect predictions. Finally, spammers
are constantly finding new ways to evade detection, making it difficult to develop effective
machine learning models for spam review identification.
1.3 Various techniques used for text review classification.
There are several techniques that can be used to identify fake reviews, which are reviews that are
not genuine and are written with the intention of misleading or manipulating others [17] – [19].
Some of these techniques include:
• Sentiment analysis: This involves analyzing the sentiment or emotion expressed in a review
to determine whether it is genuine or fake. Reviews that are written in a highly positive or
negative manner may be more likely to be fake.
• Textual analysis: This involves analyzing the content of the review to identify patterns or
characteristics that are commonly found in fake reviews. For example, reviews that use
repetitive or generic language, or that contain a high number of spelling and grammar
errors, may be more likely to be fake.
• Reviewer analysis: This involves analyzing the characteristics of the reviewer, such as the
number of reviews they have written, the types of products they have reviewed, and the
consistency of their ratings. Reviewers who have written a large number of reviews in a
short period of time, or who have given consistently high or low ratings, may be more
likely to be fake.
• Network analysis: This involves analyzing the relationships between reviewers and the
products they have reviewed, in order to identify patterns of fake reviews. For example,
reviewers who have reviewed a large number of products from the same manufacturer or
seller may be more likely to be fake.
• Machine learning: This involves training a machine learning model on a labeled dataset of
genuine and fake reviews, in order to learn the characteristics of each type of review. The
model can then be used to classify new reviews as genuine or fake.
These techniques can be used individually or in combination to identify fake reviews. It is

important to use a combination of approaches in order to increase the accuracy and reliability of
the fake review identification process. To resolve the above-mentioned issues, we proposed a
combine ML-DL approach. Combining machine learning (ML) and deep learning (DL) techniques
can be an effective way to identify spam reviews. Combining ML and DL techniques can be
effective for spam review identification because DL algorithms can analyze the content and
context of reviews to identify spam, while ML algorithms can be used to classify reviews based
on patterns in the data. This can help to improve the accuracy and effectiveness of spam review
identification systems.
The main contributions of this paper are summarized below:
• First, we analysed the effects of data cleaning, normalization, text
tokenization, punctuation removal, stop word removal, stemming, POS tagging, and NLP
processing conversion into meaningful text. We used benchmark datasets and Amazon
product review datasets for accurate categorization and analysis.
• Second, a good feature encoding approach is essential for representing customer
evaluations numerically in classification and analysis. This method converts dataset
samples into numeric feature vectors. The feature encoding converts each review into a
fixed-length vector since the text may vary in size.
• Thirdly, we employed machine learning and deep learning models with multiple layers and
parameters to categories data and detect spam reviews. These models outperformed earlier
methods. These performance of models is compared with other models using f1score,
precision, recall and accuracy parameters.
The paper was set up like this: In Chapter 1, "Introduction," you'll learn about the big picture of
the project, as well as its goal and purpose. Chapter 2's Literature Review will talk about the work
that other researchers have done in the fields of text mining, machine learning, deep learning,
feature extraction and selection, etc., as well as the core ideas that are used in the project to improve
how well it works. In Chapter 3, the steps used to carry out the project will be described in detail,
along with the implementation method and system architecture. In Chapter 4, we'll talk about how
to collect data and describe datasets, as well as the results of experiments and how to compare
machine learning, deep learning and hybrid algorithms based on different performance parameters.
Chapter 5 wraps up the work done so far and talks about what will be done next.
2. RELATED WORK
Muhammad Fayaz et al.[6] create an ensemble model that takes into account the predictions from
KNN, RF, and MLP and then classifies product reviews as either spam or non-spam. The
evaluation of the model revealed that it performed better than the other models in terms of its
classification accuracy. Achieved the accuracy of successfully identifying fake review is 88.13%.
Shwet Mani et al.[7] proposed work, which includes a simple n-gram feature, was used for the first
phase of the study. Three classification algorithms, namely SVM, Random Forest, and Nave
Bayes, were used for the classification of product reviews. The most accurate algorithm was Nave
Bayes, which achieved 87.12% accuracy. The second phase of the study used two different
ensemble techniques to improve the accuracy of the Stacking ensemble model. It performed better
with an accuracy of 87.68%.
To protect the interests of consumers and e-commerce sites, a system should be developed by Minu
Susan Jacob et al.[8] that can identify and remove fake reviews. This paper aims to develop a
framework that can detect unfair reviews on Amazon using sentiment analysis. The proposed
method is applied to a set of consumer reviews collected from Amazon.
Deepika Vachane et al.[9] present a framework that combines the capabilities of sentiment analysis
and LSA to find spam in product reviews. The proposed system using netspam algorithm used to
analyze and detect various types of spam.
Through the study, G. M. Shahariar et al.[10] were able to develop deep learning methods for
detecting spam reviews. These include the Multi-Layer Perceptron, the Convolutional Neural
Network, and the RNN. We have also applied some of these methods to perform different tasks
such as detecting the K Nearest Neighbor and the Support Vector Machine.
Based on the four fuzzy input variables, Komal Dhingra et al.[11] identified a group of people as
being at risk of being a spammer. Then generated 81 fuzzy rules and performed an evaluation
procedure on the group using the FSL Algorithm. As the volume, velocity, and variety of reviews
exceed the three V's of big data, Used Hadoop for analysis and storage, and a sample Amazon
reviews dataset to show proposed algorithm's accuracy of 80.71%.
Soheil Jamshidi et al.[12] developed a method to identify explicit incentivized reviews (EIRs) by
collecting a few datasets. Then show that the characteristics of EIRs and normal reviews vary.
Furthermore, discuss how the ban on Amazon affected the prevalence of these types of reviews.
The results of study revealed that the promotional campaigns of the sellers influenced the number
of reviews submitted for sample products.
Sakshi Shringi et al.[13] presents a hybrid GWOK algorithm that combines the basic GWO
algorithm with the k-Means clustering method to identify spam reviews. The results of the study
show that the proposed algorithm surpasses the current techniques. Obtained the accuracy for
Synthetic Spam Reviews= 80.43%, Movie Reviews=64.74%, and Yelp dataset=75.01%
respectively.
P. Bhuvaneshwari et al.[14] presents a framework that uses a deep learning algorithm to identify
spam reviews. The model, known as CNN BiLSTM, learns to recognize the level representation
of the content in a document by calculating the weightage of the words in the sentence. Through
the CNN model, the model learns to recognize sentence structure. It then combines the various
features of the sentence with contextual information to identify spam review.
Table 1 Comparison of various work
Author Dataset Algorithm used Outcome Findings Advantages

Muhammad Yelp Ensemble Accuracy=88.13% Feature selection is The use of ensemble learning
Fayaz et al. [6] classifier (MLP, missing. Include all improves the performance of the
KNN, and RF) features which reduces the model
accuracy
Shwet Mani et Gold Standard SVM, Naïve Accuracy=87.43% Time consuming as The use of ensemble learning
al.[7] Dataset Bayes, and implemented in 2 various improves the performance of the
Random Forest phases model
Nishant Jain et LMRD, DRD CNN-GRU Accuracy= 88.9% Restricting the The use of deep learning
al. [15] review text to a smaller improves the performance of the
fixed length model
Minu Susan Amazon CNN, CNN- Accuracy= 80.33% Implement sentiment The use of opinion mining and
Jacob et al.[8] product LSTM, RNN, LR classification machine learning improves the
reviews performance of the model
Ilakiyaselvan Amazon Decision Tree, RF Accuracy=93.33% Only specific to reviews of Usage of ML for better
N et al. [16] reviews (RF) TV identification
Deepika Amazon Netspam Accuracy= 92.06% Finds Spammer The use of machine learning
Vachane et reviews algorithm Community improves the performance of the
al.[9] system
Saleh Nagi Ott dataset NB, SVM, AB, Accuracy=95% Dataset limited to hotel Usage of ML for better
Alsubari et al. RF data only. identification
G. M. Ott and YELP CNN, LSTM, Accuracy = 94.56% Used lesser data The use of deep learning
Shahariar et dataset MLP & 96.5% for Ott and improves the performance of the
al. [10] Yelp dataset resp. model
Komal Amazon Fuzzy based Accuracy=80.71% Uses unsupervised The use of Fuzzy Ranking
Dhingra et al. reviews learning Evaluation Algorithm and
[11] Hadoop improves the
performance of the model
Sakshi Shringi Spam hybrid Grey Wolf Synthetic Spam Clustering method The use of hybrid grey wolf
et al.[13] Reviews, Optimizer using k- Reviews Accuracy = outperformed existing optimizer clustering method
Movie Means clustering 80.43%, Movie methods improves the performance of the
Reviews, and Reviews = 64.74%, model
Yelp dataset and Yelp =75.01%
P. YelpZip CNN, Bi-LSTM Accuracy=87.3% Uses self-attention The use of self-attentions based
Bhuvaneshwa dataset mechanism CNN and bi-directional LSTM
ri et al.[14] improves the performance of the
model
3. PROPOSED SYSTEM
Hybrid machine learning is a term used to describe machine learning approaches that combine
multiple different techniques or algorithms. This can be useful when different types of techniques
are complementary or when combining approaches can lead to improved performance on a
particular task. Hybrid machine learning approaches that combine deep learning with other
techniques are often used to solve complex problems where deep learning alone may not be
sufficient. For example, a hybrid approach might use a deep learning model to extract features
from data and then use a more traditional machine learning algorithm to make a prediction based
on those features. This can be particularly useful in tasks such as language translation or image
recognition, where deep learning has proven to be very effective but may not capture all of the
necessary information on its own. There are many different ways that deep learning and other
techniques can be combined in a hybrid approach, and the specific combination used will depend
on the specific problem being solved and the available data. Here we proposed the hybrid of CNN
+ NB, CNN+ RF, CNN + SVM and compare the output for effectiveness comparing to existing
model classifiers.
3.1 System Architecture
The method suggested is made to fit the needs of review classification. It is made up of three steps:
the pre-processing step, the selection step, and the evaluation step. In the first step of the process,
the input product reviews are put through a number of pre-processing steps. Some of these methods
are taking out "stop words," "rooting" words, "pruning" words, and "term weighting." The goal of
this step is to turn the review text into a form that classification algorithms can understand. The
second part of the method is to extract features from the pre-processed dataset. In the third and
final step, a supervised learning algorithm is used to evaluate the selected features from the feature
set. Figure 2 also shows these stages. Below are important steps involved in spam detection.
Figure 2. Proposed System Architecture
3.1.1 Pre-processing steps

Pre-processing is the process of preparing and cleaning the data before it is used to train a machine
learning or deep learning model. Pre-processing is an important step in the model development
process because it can help improve the performance and generalization of the model. There are
several common pre-processing steps that are typically performed on the input data:
• Data cleaning: This involves identifying and correcting errors or missing values in the data.
• Data transformation: This involves scaling or normalizing the data, as well as applying any
necessary transformations to make the data more suitable for the model.
• Feature selection: This involves selecting the most relevant features from the data to use as
inputs to the model.
• Data augmentation: This involves generating new data samples from the existing data by
applying various transformations, such as rotating or shifting the data. This can be used to
increase the size of the dataset and improve the model's ability to generalize.
Pre-processing is a crucial step in the model development process and can greatly affect the
performance of the model. It is important to carefully consider the appropriate pre-processing steps
for the specific task and dataset at hand.
Here first, a process for cleaning up product review data is used to get rid of “stopwords” and
words with the same root. It cuts down on the number of data vectors. Pre-processing tasks are
done with the help of the NLTK library. After a clean dataset is received, data features are taken
from it. Several techniques for extracting features are used. As feature values, the number of times
each word is used is also used in the learning process of a classifier. But because the bag of words
model makes a lot of features, it is necessary to do some pre-processing work to reduce the high
dimensionality of the terms space. The most common pre-processing step for review categorization
is getting rid of stopwords and figuring out where words came from. To get rid of stop words, you
have to take out from the feature space any common words that don't have anything that makes
them different from each other. For example, "a," "the," and "that" are common words that show
up about the same amount in all of the texts. However, they don't tell us anything that will help us
predict what will happen in the class. Stemming is the process of getting rid of the root forms of
words. Because of this, many words that have the same root because of their affixes can be
recognized in the feature space as the same term. For example, the words "computer,"
"computing," "computation," and "computes" all come from the word "compute," which means
the same thing as the word itself. In this research, this goal is reached with the help of the NLTK.
3.1.2 Feature Extraction Techniques
Feature extraction is the process of selecting and extracting relevant information from a dataset to
be used as input to a machine learning model. In the context of fake review identification, feature
extraction is used to select and extract features from the text of a review that are relevant for
identifying fake reviews. There are many different techniques that can be used for feature
extraction in fake review identification, including:
• Bag-of-words: This technique represents the text of a review as a vector of counts of the
words that appear in the review. It does not take into account the order of the words, but it
can be effective for identifying common words or phrases that are used in fake reviews.
• Term frequency-inverse document frequency (TF-IDF): This technique assigns a weight
to each word in a review based on how common it is in the review compared to the rest of
the dataset. It can be used to identify words that are more or less common in fake reviews
than in genuine reviews.
• Part-of-speech tagging: This technique involves identifying the parts of speech (nouns,
verbs, etc.) in a review and extracting the words that are most relevant for identifying fake
reviews. For example, you might focus on identifying words that are commonly used in
fake reviews, such as "free" or "discount."
The specific technique or combination of techniques used for feature extraction will depend on the
specific requirements and constraints of the task, as well as the nature of the data and the type of
model being used. It is also important to consider how the features will be used in the model and
how they will contribute to the model's performance.
A. Natural Language Processing
Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial
intelligence that deals with the interaction between computers and human (natural) languages. The
goal of NLP is to design and develop algorithms and models that can understand, interpret, and
generate human language in a way that is useful for various applications. Some common
applications of NLP include language translation, text classification, information extraction, and
question answering.
NLP tasks can be broadly classified into two categories: symbolic and statistical. Symbolic NLP
approaches rely on hand-crafted rules and heuristics to process and understand language, while
statistical NLP approaches use machine learning algorithms to learn patterns and relationships in
large amounts of language data. Modern NLP approaches often combine both symbolic and
statistical techniques.
There are many different techniques and approaches used in NLP, including lexical analysis,
parsing, semantic analysis, and discourse analysis. Some of the key challenges in NLP include
dealing with the complexity and variability of natural language, handling ambiguity and context-
dependent meaning, and handling language generation tasks such as summarization and machine
translation.
B. TF-IDF
Term frequency-inverse document frequency (TF-IDF) can be used as a feature in machine
learning models for identifying spam reviews on Amazon or other online platforms. The idea is
that spam reviews tend to use certain words or phrases more frequently than non-spam reviews,
and TF-IDF can help to identify these words and give them more weight in the model.
For example, a model trained on Amazon reviews might use TF-IDF to identify words such as
"free," "discount," or "click here" as indicators of spam. These words might be more common in
spam reviews than in non-spam reviews, and so they would be given a higher weight in the model.
On the other hand, words that are more common in non-spam reviews, such as product names or
descriptions, would be given a lower weight.
To use TF-IDF in a spam detection model, you would first need to collect a large dataset of
Amazon reviews, both spam and non-spam. You would then need to pre-process the data by
converting the text of each review into a numerical representation, such as a bag-of-words model
or a term frequency matrix. You would then calculate the TF-IDF weighting for each word in each
review and use these weights as features in a machine learning model, such as a logistic regression
or a support vector machine.
The specific approach and model used will depend on the specific requirements and constraints of
the task. It is also important to consider other factors that might affect the performance of the
model, such as the size and quality of the training data, the choice of hyper parameters, and the
ability to handle class imbalances in the data. The formula for TF is as follows:
TF(t) = (Number of times the term t appears in a document) / (Total number of terms in the
document).
The Inverse Document Frequency (IDF) is a way to figure out how important a word is based on
how often it appears in the document as a whole. This method of giving each word a different
meaning based on how important it is in the text is not used in bag of words. Here's how the IDF
figures it out:
IDF(t) = log e (Total number of documents / Number of documents that contain term t).
Keeping with the example in Table 2, we figure out the tf-idf vector and the score for each
document in C. Some words are used in more than one paper, while others are only used in one.
When you look at the total number of papers, N equals 3. Because of this, here are the idf values
for the words:
angles log2(3/1) = 1.584
los log2(3/1) = 1.584
new log2(3/2) = 0.584
post log2(3/1) = 1.584
times log2(3/2) = 0.584
york log2(3/2) = 0.584
Table 2: Example of tf-idf of words model
Angles Los New post times york

S1 0 0 0.584 0 0.584 0.584
S2 0 0 0.584 1.584 0 0.584
S3 1.584 1.584 0 0 0.584 0
C. Sentiment Analysis
Sentiment analysis is a technique used to identify the sentiment or emotion expressed in a piece of
text, such as a review. It is often used in the context of fake review identification to help identify
reviews that are biased or misleading.
There are several approaches to sentiment analysis, including rule-based approaches, which rely
on hand-crafted rules and heuristics to identify sentiment, and machine learning approaches, which
use supervised or unsupervised learning algorithms to learn patterns in the data and make
predictions about the sentiment of a given text.
In the context of fake review identification, sentiment analysis can be used to identify reviews that
are overly positive or negative, as fake reviews are often written with the intention of misleading
the reader. For example, a fake positive review might use very positive language and give the
product a high rating, while a fake negative review might use negative language and give the
product a low rating.
To use sentiment analysis in fake review identification, you would first need to collect a dataset of
reviews, both fake and genuine. You would then need to pre-process the data by converting the
text of each review into a numerical representation, such as a bag-of-words model or a term
frequency matrix. You would then apply a sentiment analysis technique to the data, such as a
machine learning classifier, to predict the sentiment of each review.
The specific approach and model used will depend on the specific requirements and constraints of
the task. It is also important to consider other factors that might affect the performance of the
model, such as the size and quality of the training data, the choice of hyper parameters, and the
ability to handle class imbalances in the data.
3.1.3 Split the Train Test:
The train-test split is used to figure out how well ML, DL, and hybrid algorithms work when they
are used to make predictions. This method is a quick and easy way for us to compare the results
of our own machine learning, deep learning, and hybrid model to what the machine comes up with.
When the default settings are used, 20% of the real data are in the Test set, while 80% of the real
data are in the Training set. Before we can test how well our model works, we need to divide a
dataset into train and test sets. The train set is used to fit the model, and all the information about
the train set is already known. The second group of data is called the "test dataset," and its main
purpose is to help make predictions.
3.1.4 ML / DL Algorithms
With the data we have, we have a lot of options. We did our work with the help of the Amazon
Product Review Dataset. For our ML algorithm implementation, we used Scikit learn and
tensorflow to process data. Here is a list of the machine learning algorithms that are used: Support
Vector Machine (SVM), Neural Network (NN), Convolution Neural Network (CNN). Detailed
description of machine learning model is presented in Background (existing classifiers) chapter.
In machine learning and deep learning, a classifier is a model that is trained to predict the class or
category of an input sample. Classifiers are used in a wide range of applications, including spam
filtering, image classification, and natural language processing[22]–[24]. There are many different
types of classifiers, including linear classifiers, support vector machines (SVMs), decision trees,
and neural network-based classifiers such as convolutional neural networks (CNNs) and recurrent
neural networks (RNNs). The choice of classifier depends on the nature of the input data and the
task at hand.
• Linear classifiers, such as logistic regression, are based on the assumption that the input
data is linearly separable, meaning that it can be separated into different classes by a linear
boundary. These classifiers are simple and efficient, but they may not be suitable for more
complex datasets.
• SVMs are a type of linear classifier that seek to find the hyperplane in the feature space
that maximally separates the different classes. They are effective for high-dimensional
datasets and can handle non-linear boundaries by using kernel functions to transform the
data into a higher-dimensional space.
• Decision trees are a type of classifier that use a tree structure to make predictions based on
the values of the input features. Each node in the tree represents a decision based on a
feature, and the branches represent the possible outcomes of that decision. Decision trees
are simple to understand and interpret, but they may not be as accurate as some other
classifiers.
• Neural network-based classifiers, such as CNNs and RNNs, are capable of learning
complex relationships in the data and can achieve state-of-the-art results on many tasks.
However, they require a large amount of labeled data and computational resources to train.
A. SVM
Support Vector Machines (SVMs) are a type of supervised machine learning algorithm that can be
used for classification or regression tasks. The goal of an SVM is to find the hyperplane in a high-
dimensional space that maximally separates the positive and negative examples. The distance
between the hyperplane and the nearest data points is known as the margin. SVMs try to maximize
the margin between the two classes of data points.
SVMs are particularly useful in cases where the number of dimensions is much greater than the
number of samples, as they tend to be more robust than other algorithms in such situations. They
can also handle cases where the data is non-linearly separable by using the so-called "kernel trick"
to transform the data into a higher-dimensional space where it becomes linearly separable. In
summary, SVMs are a powerful tool for classification and regression tasks and are particularly
effective in high-dimensional spaces and when the data is non-linearly separable.
The basic idea behind SVMs is to find a hyperplane that maximally separates the different classes
(e.g. real vs fake reviews) in the feature space. Once the hyperplane is found, new samples can be
easily classified by checking on which side of the hyperplane they fall. SVMs have been shown to
perform well in text classification tasks and have been used in various studies for fake review
detection. It is important to note that SVM alone may not be the best solution for this problem and
often it is used in combination with other techniques such as feature engineering, natural language
processing and deep learning[21], [25].
B. Neural Network
A neural network is a type of machine learning model that is inspired by the structure and function
of the brain. It consists of layers of interconnected "neurons," which process and transmit
information. Each neuron receives input from other neurons, performs a computation on that input,
and produces an output that is transmitted to other neurons in the next layer. Neural networks are
particularly useful for tasks that involve pattern recognition or data that is difficult to process using
a traditional, rule-based approach. They have been successful at tasks such as image and speech
recognition, natural language processing, and even playing games like chess.
There are many different types of neural networks, including feedforward networks, convolutional
neural networks, recurrent neural networks, and autoencoders. The specific architecture of a neural
network depends on the task it is being used for. It is able to process sequential data such as text
by maintaining an internal state that can store information about the previous inputs. These
architectures have been used in many studies for fake review detection and have been shown to
achieve high performance. However, it is important to note that fake review detection is a complex
task and NNs alone may not be the best solution for this problem and often it is used in combination
with other techniques such as feature engineering and natural language processing [26].
C. CNN
CNN (Convolutional Neural Network) is a type of artificial neural network that is particularly
effective at recognizing patterns and features in images. It is a deep learning algorithm that is
commonly used in image and video recognition tasks, such as object classification and face
recognition. CNNs are composed of a series of layers that process input data and extract relevant
features. The first layers of a CNN typically consist of convolutional layers, which apply a series
of filters to the input data to identify patterns and features. These filters are called kernels or
weights, and they are learned through training the network on a labeled dataset. After the
convolutional layers, the network typically includes pooling layers, which downsample the data
and reduce the dimensionality of the feature maps. This helps to reduce the complexity of the
model and improve its ability to generalize to new data.
The final layers of a CNN are typically fully-connected layers, which process the extracted features
and make predictions based on them. The output of the fully-connected layers is typically a
probability distribution over a set of classes, indicating the likelihood that the input belongs to each
class. CNNs are widely used in a variety of applications, including image classification, object
detection, and facial recognition. They have been successful in achieving state-of-the-art results
on many tasks and have become a key tool in the field of computer vision.
Convolutional Neural Networks (CNNs) are a type of neural network that can be used for image
and text classification tasks, including identifying fake reviews. The main characteristic of CNNs
is the use of convolutional layers, which are designed to identify local patterns in the input data by
applying a set of filters. These filters "slide" over the input data (e.g. text) and extract features such
as edges, shapes, and textures. These features are then passed through several layers of neural
network, where they are combined and transformed in a hierarchical manner to learn more complex
representations of the input data. In the case of fake review detection, CNNs can be trained to learn
the patterns in the text that are indicative of fake reviews, such as specific words or phrases that
are commonly used in fake reviews. CNNs have been used in many studies for fake review
detection and have been shown to achieve high performance in comparison to other methods.
However, it is important to note that fake review detection is a complex task, and CNN alone may
not be the best solution for this problem and often it is used in combination with other techniques
such as feature engineering and natural language processing [27].
3.1.5 Proposed Combine Machine Learning and Deep Learning Approach
Sentiment analysis, also known as opinion mining, is the use of natural language processing, text
analysis, and computational linguistics to identify and extract subjective information from source
materials. The goal of sentiment analysis is to determine the attitudes, opinions, and emotions of
a speaker or writer with respect to some topic or the overall contextual polarity of a document.
A. CNN + RF / CNN +SVM / CNN + NB
One popular approach to classify reviews is to use a combination of convolutional neural networks
(CNNs) and random forests (RFs) / Support Vector Machine (SVM) / Naïve Bayes (NB)[28]–[30].
CNNs are a type of neural network that are often used for image recognition and have been adapted
for use in natural language processing tasks like sentiment analysis. The main advantage of using
a CNN for sentiment analysis is that it is able to learn the local patterns and features within a
sentence that are most indicative of the sentiment. Random forests, on the other hand, are a type
of decision tree-based ensemble learning method. They are particularly well suited for sentiment
analysis due to their ability to handle large datasets and their robustness to overfitting.
When used in combination, CNNs and RFs can complement each other's strengths to achieve
improved performance on sentiment analysis tasks. The CNN can be used to extract features from
the text, such as n-grams or word embeddings, which are then fed into the RF for classification.
This can be done by using the CNN to train a set of word embeddings for the text, which are then
used as input to the RF classifier. The output of the CNN can also be used as input to the RF as
feature representation of the text.
Support vector machines (SVMs) are a type of supervised learning algorithm that can be used for
classification and regression tasks. One of the main advantages of using an SVM for sentiment
analysis is that it is able to perform well with high-dimensional data, such as text, by using a
technique called kernel trick.
Naive Bayes classifiers, on the other hand, are a family of simple probabilistic classifiers based on
applying Bayes' theorem with strong (naive) independence assumptions between the features. One
of the main advantages of using a naive bayes classifier is its simplicity, it's easy to implement and
computationally efficient. It's also particularly well suited for text classification problems and can
be trained on small datasets as well.
Here we perform CNN and RF / SVM / NB in combination by stacking them. In this approach, the
CNN is used as a feature extractor, and the output of the CNN is passed as input to the RF / SVM
/ NB classifier. This allows the RF / SVM / NB to take advantage of the learned features extracted
by the CNN and improve its accuracy. The RF / SVM / NB can use the CNN extracted features to
improve its classification accuracy. In conclusion, combining CNN and RF / SVM / NB models
can be a powerful tool in review classification task, by taking advantage of the strengths of both
models, CNNs ability to extract feature and RF / SVM / NB ability to handle large datasets and
robustness to overfitting. This could lead to improved performance over using either model alone.
Below figure 3 shows the combine approach of CNN with Other ML algorithms.
Figure 3. Combining DL (CNN) with ML (RF /SVM / NB) Models
To improve the performance, the basic CNN algorithm is modified by reducing several layers and
replacing final layers with machine learning (SVM / RF / NB) algorithms as show in figure 3. It
has one convolution layer, two dense layers, one max-pooling layer and one flatten layer. This
model has only 11,036 parameters. It is lighter than the existing CNN model. The model has input
shape of (6*1) where the features are extracted using NLP technique and sentiment analysis. The
output we get is binary classification where we detect the review is spam or not.
3.1.6 Validation Technique

K-fold cross-validation is a resampling procedure used to evaluate the performance of machine
learning models. It involves dividing the data into K "folds" and then training and evaluating the
model K times, each time using a different fold as the test set and the remaining folds as the training
set. The performance measure is then averaged across all K iterations. The main advantage of K-
fold cross-validation is that it allows to evaluate the model's performance on a larger portion of the
data than just a single train/test split. This gives a better idea of how the model will perform on
unseen data and can help to reduce the variance of the model's performance estimate. K-fold cross-
validation is a widely used technique in machine learning and is often used to tune the hyper
parameters of a model, select the best model from a set of candidate models, or to estimate the
generalization performance of a model. Here we have used K-fold cross-validation, with setting
the value of K=10.
4. RESULTS AND DISCUSSION
This chapter gives a brief summary of what happened in the experiments that came before. When
talking in depth about each of the results that were reached, the method was taken into account.
To start, the annotated dataset that has been turned into numeric vectors by using different features
is put into PyCharm. The splitting is done in two different ways: one is with "train test split," which
has been talked about before, and the other is with "K fold cross validation," which has also been
talked about before. The collected data were put into tables based on the many different factors
that were taken into account for this paper. In this section, the results of classifying the product
review dataset using different classifiers and possibly changing some parameters for a given
classifier are shown.
4.1 Experimental Setup
The model is built on the “sklearn” and “Tensorflow” framework and operates on the “Pycharm”
IDE, which is primarily used for machine learning-based research and development. F1-score,
precision, recall, and accuracy are used to assess the performance of suggested models.
4.2 Dataset Description (Web Scrapping)

Web scraping refers to the extraction of data from a website. This data can be in the form of HTML,
CSS, JavaScript, or any other type of content found on the website. Web scraping is often used to
gather data for data analysis or to automate tasks on the web. [20][21]There are several ways to
perform web scraping, including using a web scraper tool or writing a script that makes HTTP
requests to the website and parses the response. Some popular tools for web scraping include
Python's Beautiful Soup library, the Ruby gem Nokogiri, and the Java library JSoup.
It's important to note that web scraping may be against the terms of use of some websites,
and it can be illegal in some cases. It's always a good idea to check the terms of use and make sure
you have permission before you start scraping a website. There are several types of web scraping,
including: Structured Data Scraping, Unstructured Data Scraping, Dynamic Data Scraping,
JavaScript Rendering and API Scraping.
We have use API scrapping technique. API Scrapping involves extracting data from APIs
(Application Programming Interfaces) provided by websites. APIs allow developers to access data
from a website in a structured and controlled way, making it easier to scrape. Here we use Amazon
review dataset. Amazon review data is a collection of customer reviews, ratings, and metadata for
products sold on Amazon.com. The data includes information about the product being reviewed,
the customer who left the review, and the review itself. This data is often used by researchers and
companies to analyze customer opinions and preferences, as well as to improve products and
customer experience. The data is typically collected and made available in a structured format,
such as a table or spreadsheet, with each row representing a single review and each column
representing a different piece of information about the review, such as the customer's rating, the
review text, and the date the review was written. Some datasets may also include additional
metadata, such as the product's category, price, and brand. Amazon review data is typically used
for a variety of tasks, including sentiment analysis, product recommendation, and market research.
It is often used in conjunction with other data sources, such as product descriptions and sales data,
to gain a more complete understanding of customer behavior and preferences. To download
dataset, we used web scrapping technique and downloaded product review dataset of positive and
critical category. Below figure 4 shows the review web scrapping technique.
Figure 4. Web-scrapping
4.3 Performance Parameters
Confusion Matrix: A confusion matrix is a table that is used to evaluate the performance of a
classification model. It is a common tool in machine learning and data analysis, and is particularly
useful for understanding the performance of a model on a classification task. The confusion matrix
is constructed by comparing the predicted classes of the model with the true classes of the test
data. The rows of the matrix represent the predicted classes, and the columns represent the true
classes. The confusion matrix typically includes the following four elements: The confusion matrix
can be used to calculate several evaluation metrics, including accuracy, precision, recall, and F1
score. These metrics can be useful for understanding the strengths and weaknesses of the model
and for comparing the performance of different models.
• Precision = True Positive / (True Positive + False Positive)

• Recall = True Positive / (True Positive + False Negative)
• F1-Score = 2* ((Precision * Recall)/(Precision + Recall)
• Accuracy = (True Positive + True Negative) / (True Positive+ False Positive + False Negative
+ True Negative)
Figure. 5 Confusion Matrix

4.4 Results
The proposed work used Naïve Bayes (NB), support vector machine (SVM), Random Forest (RF),
Neural Network (NN), Convolution Neural Network (CNN) and Hybrid (CNN+RF, CNN+SVM,
CNN+NB) classifiers to detect whether the Amazon product review data is spam or not. The
proposed work used TF-IDF, NLP processing, Sentiment analysis before applying machine
learning, deep learning and hybrid classifier. Table 2 (A, B) gives various performance measures
(in percentage) obtained for spam detection after applying the Feature extraction techniques on
product review dataset downloaded from kaggle. Figure 6, Figure 7 shows the performance
parameters comparison graph for spam reviews detection on kaggle dataset.
Table 2 (A). Performance Comparison of various algorithms on Spam Review Dataset downloaded from Kaggle (Train Test
Split)
Classifier Accuracy Precision Recall F1-Score Time (sec)
VALIDATION TECHNIQUE : Train –Test Split (80% - 20%)

SVM 74.19 74.24 76.04 73.16 8.66
NB 82.13 83.54 81.33 82.41 3.45
RF 84.52 84.64 82.11 83.67 2.94
NN 95.39 94.98 91.11 92.90 19.13
CNN 96.77 96.38 92.81 94.74 6.41
CNN + NB 98.21 98.89 95.52 96.19 3.24
CNN + RF 99.04 100.0 97.90 98.84 2.85
CNN + SVM 98.19 98.81 96.85 96.74 3.96
PERFORMANCE PARAMTERS COMPARISON GRAPH ON KAGGLE

DATASET WITH TRAIN - TEST SPLIT
120
SVM
100
NB
80 RF
NN
60
%
CNN
CNN+NB
40
CNN + RF
20 CNN + SVM
0
Accuracy Precision Recall F1-Score
PERFORMANCE PARAMETERS
Figure 6. Performance parameters comparison graph for spam reviews detection on kaggle dataset with Train and
test split (80% - 20 %) technique
Table 2 (B). Performance Comparison of various algorithms on Spam Review Dataset downloaded from Kaggle (10 Fold
Validation)
VALIDATION TECHNIQUE : 10 Fold Cross Validation (K-fold)

SVM 67.56 68.97 64.85 66.73 47.57
NB 64.86 66.40 65.74 65.11 14.84
RF 72.22 71.54 73.00 72.50 27.72
NN 91.64 90.94 91.39 91.16 142.56
CNN 92.96 91.98 90.11 91.90 31.52
CNN + NB 96.59 97.13 94.44 92.16 28.64
CNN + RF 97.74 97.17 97.19 94.68 24.60
CNN + SVM 97.19 98.42 92.87 94.56 49.54
PERFORMANCE PARAMTERS COMPARISON ON KAGGLE DATASET

WITH K-FOLD VALIDATION
120
SVM
100
NB
80 RF
NN
60
%
CNN
CNN+NB
40
CNN + RF
20 CNN + SVM
0
Figure 7. Performance parameters comparison graph for spam reviews detection on Kaggle dataset with 10- fold
cross validation technique.
Table 3 (A, B) gives various performance measures (in percentage) obtained for spam detection
after applying the Feature extraction techniques on dataset downloaded from amazon website
(amazon product reviews) using web scrapping technique. Figure 8, Figure 9 shows the
performance parameters comparison graph for spam reviews detection on amazon review dataset.
Table 3 (A) Performance Comparison of various algorithms on Spam Review Dataset downloaded from Amazon Website using
Web scrapping technique (Train –Test Split)
VALIDATION TECHNIQUE : Train –Test Split (80% - 20%)

SVM 56.02 52.75 58.68 56.55 15.08
NB 69.85 66.76 69.57 68.41 8.15
RF 74.47 74.21 75.74 74.42 12.45
NN 92.41 91.55 93.49 92.5 42.44
CNN 95.56 94.48 95.0 95.04 18.10
CNN + NB 96.81 96.81 92.74 94.57 8.04
CNN + RF 97.44 97.13 98.74 98.05 8.50
CNN + SVM 97.75 98.59 98.19 97.13 9.44
PERFORMANCE PARAMTERS COMPARISON GRAPH ON AMAZON

REVIEW DATASET (WEB SCRAPPING) USING TRAIN-TEST SPLIT
120
SVM
100
NB
80 RF
NN
60 CNN
%
CNN+NB
40
CNN + RF
CNN + SVM
20
0
Figure 8. Performance parameters comparison graph for spam reviews detection on amazon product review dataset
download using web scrapping technique and applying train test split (80% - 20%)
Table 3 (B) Performance Comparison of various algorithms on Spam Review Dataset downloaded from Amazon Website using
Web scrapping technique (10-fold Validation technique)
VALIDATION TECHNIQUE : 10 Fold Cross Validation (K-fold)

SVM 51.14 49.85 50.47 49.06 111.3
NB 64.28 64.21 66.01 64.5 39.41
RF 60.28 62.25 59.8 60.74 58.68
NN 74.75 73.56 75.41 74.56 157.68
CNN 86.44 89.21 82.2 85.45 103.5
CNN + NB 90.96 92.42 89.56 89.10 35.4
CNN + RF 91.54 91.04 90.87 90.89 38.79
CNN + SVM 89.67 88.85 89.77 89.54 42.14
PERFORMANCE PARAMTERS COMPARISON GRAPH ON AMAZON
REVIEW DATASET WITH K-FOLD CROSS VALIDATION
100
90 SVM
80 NB
70 RF
60 NN
50 CNN
%
40 CNN+NB
30 CNN + RF
20 CNN + SVM
10
0
Figure 9. Performance parameters comparison graph for spam reviews detection on amazon product review dataset
(web scrapping)
Figure 10. ROC Curve comparison of algorithms (a) NB (b) RF (c) SVM (d) NN (e) CNN (f) CNN + NB (g) CNN +
SVM (h) CNN + RF
Figure 10. shows the ROC curve comparison of various algorithms on amazon product spam
review dataset classification, from graph we can clearly see that deep learning algorithm and
combining approach of deep learning algorithm shows the better ROC curve compare to traditional
machine learning approach.
4.5 Discussion
This section provides discussion on algorithms used for spam detection. An attempt has been made
to provide researchers with a comparative analysis of different spam review detection methods and
their reported accuracy with our proposed hybrid classification technique. Here we have used two
datasets 1. Product review dataset downloaded from kaggle 2. Amazon product review dataset
downloaded from amazon website using web scrapping technique. And also we have use K-fold
validation technique for both datasets. The accuracy of different supervised-learning-based works
is presented in Table 3 and Table. It shows that proposed combine ML-DL algorithms outperforms
traditional machine learning and deep learning algorithms in terms of accuracy, precision, recall
and f1-score.
5. CONCLUSION AND FUTURE WORK

Firstly, this paper studies and analyses the current product spam reviews techniques. Based on the
analysis of the influencing factors of product review classification accuracy, an optimization
system process based on hybrid classification is proposed, and the intelligent optimization
algorithm is combined to improve the spam review detection accuracy. Therefore, this paper
introduces hybrid machine learning and deep learning algorithm (CNN + RF / CNN + SVM / CNN
+ NB) for product spam reviews detection on web. The research results show the effectiveness of
our proposed model. For Validation of our proposed algorithm we have used K-fold validation
technique. The proposed hybrid classifier achieved a classification accuracy of 96% ~ 99% on
both datasets with (80%-20%) train test split and achieved a validation accuracy of 90% ~ 92%
using k-fold validation technique.
We plan to extend our technique for spam review detection in other categories / dataset
such as twitter dataset, movie reviews dataset or news in the future. We will also combine the
suggested technique with other known transfer learning techniques (pre-trained model) to improve
their performance in terms of accuracy and reduce their time complexity.
REFERENCESs
[1] S. N. Alsubari et al., “Data analytics for the identification of fake reviews using supervised learning,”
Comput. Mater. Contin., vol. 70, no. 2, pp. 3189–3204, 2022, doi: 10.32604/cmc.2022.019625.
[2] I. Amin and M. Kumar Dubey, “An overview of soft computing techniques on Review Spam Detection,”
Proc. 2021 2nd Int. Conf. Intell. Eng. Manag. ICIEM 2021, pp. 91–96, 2021, doi:
10.1109/ICIEM51511.2021.9445280.
[3] A. Mewada and R. K. Dewang, “A comprehensive survey of various methods in opinion spam detection,”
Multimed. Tools Appl., 2022, doi: 10.1007/s11042-022-13702-5.
[4] S. Saumya and J. P. Singh, “Spam review detection using LSTM autoencoder: an unsupervised approach,”
Electron. Commer. Res., vol. 22, no. 1, pp. 113–133, 2022, doi: 10.1007/s10660-020-09413-4.
[5] V. Gupta, A. Aggarwal, and T. Chakraborty, “Detecting and Characterizing Extremist Reviewer Groups in
Online Product Reviews,” IEEE Trans. Comput. Soc. Syst., vol. 7, no. 3, pp. 741–750, 2020, doi:
10.1109/TCSS.2020.2988098.
[6] M. Fayaz, A. Khan, J. U. Rahman, A. Alharbi, M. I. Uddin, and B. Alouffi, “Ensemble machine learning
model for classification of spam product reviews,” Complexity, vol. 2020, 2020, doi: 10.1155/2020/8857570.
[7] S. Mani, S. Kumari, A. Jain, and P. Kumar, Spam review detection using ensemble machine learning, vol.
10935 LNAI. Springer International Publishing, 2018.
[8] M. S. Jacob, S. Rajendran, V. Michael Mario, K. T. Sai, and D. Logesh, “Fake Product Review Detection
and Removal Using Opinion Mining Through Machine Learning,” Proc. Int. Conf. Artif. Intell. Smart Grid Smart
City Appl., pp. 587–601, 2020, doi: 10.1007/978-3-030-24051-6_55.
[9] D. V. Et. al., “Online Products Fake Reviews Detection System Using Machine Learning,” Turkish J.
Comput. Math. Educ., vol. 12, no. 1S, pp. 29–39, 2021, doi: 10.17762/turcomat.v12i1s.1548.
[10] G. M. Shahariar, S. Biswas, F. Omar, F. M. Shah, and S. Binte Hassan, “Spam Review Detection Using Deep
Learning,” 2019 IEEE 10th Annu. Inf. Technol. Electron. Mob. Commun. Conf. IEMCON 2019, pp. 27–33, 2019,
doi: 10.1109/IEMCON.2019.8936148.
[11] K. Dhingra and S. K. Yadav, “Spam analysis of big reviews dataset using Fuzzy Ranking Evaluation
Algorithm and Hadoop,” Int. J. Mach. Learn. Cybern., vol. 10, no. 8, pp. 2143–2162, 2019, doi: 10.1007/s13042-017-
0768-3.
[12] S. Jamshidi, R. Rejaie, and J. Li, “Characterizing the dynamics and evolution of incentivized online reviews
on Amazon,” Soc. Netw. Anal. Min., vol. 9, no. 1, pp. 1–15, 2019, doi: 10.1007/s13278-019-0563-0.
[13] S. Shringi and H. Sharma, “Detection of spam reviews using hybrid grey wolf optimizer clustering method,”
Multimed. Tools Appl., vol. 81, no. 27, pp. 38623–38641, 2022, doi: 10.1007/s11042-022-12848-6.
[14] P. Bhuvaneshwari, A. N. Rao, and Y. H. Robinson, “Spam review detection using self attention based CNN
and bi-directional LSTM,” Multimed. Tools Appl., vol. 80, no. 12, pp. 18107–18124, 2021, doi: 10.1007/s11042-021-
10602-y.
[15] N. Jain, A. Kumar, S. Singh, C. Singh, and S. Tripathi, Deceptive Reviews Detection Using Deep Learning
Techniques, vol. 11608 LNCS. Springer International Publishing, 2019.
[16] N. Ilakiyaselvan, S. K. J, and S. Verma, “FRAUDULENT REVIEWS DETECTION USING MACHINE
LEARNING ALGORITHM,” vol. 7, no. 15, pp. 1635–1645, 2020.
[17] S. Ahmed and F. Muhammad, “Using Boosting Approaches to Detect Spam Reviews,” 1st Int. Conf. Adv.
Sci. Eng. Robot. Technol. 2019, ICASERT 2019, vol. 2019, no. Icasert, 2019, doi: 10.1109/ICASERT.2019.8934467.
[18] G. Bathla, P. Singh, R. K. Singh, E. Cambria, and R. Tiwari, “Intelligent fake reviews detection based on
aspect extraction and analysis using deep learning,” Neural Comput. Appl., vol. 34, no. 22, pp. 20213–20229, 2022,
doi: 10.1007/s00521-022-07531-8.
[19] R. K. Dewang and A. K. Singh, “State-of-art approaches for review spammer detection: a survey,” J. Intell.
Inf. Syst., vol. 50, no. 2, pp. 231–264, 2018, doi: 10.1007/s10844-017-0454-7.
[20] S. K. Chauhan, A. Goel, P. Goel, A. Chauhan, and M. K. Gurve, “Research on Product Review Analysis and
Spam Review Detection,” pp. 2–5, 2017.
[21] E. Suganya and S. Vijayarani, Sentiment Analysis for Scraping of Product Reviews from Multiple Web Pages
Using Machine Learning Algorithms. Springer International Publishing, 2020.
[22] A. Ghourabi and M. A. Mahmood, “A Hybrid CNN-LSTM Model for SMS Spam Detection in Arabic and
English Messages,” pp. 1–16, 2020, doi: 10.3390/fi12090156.
[23] S. P. Rajamohana, “An Effective Hybrid Cuckoo Search with Harmony Search for Review Spam Detection,”
pp. 978–981, 2017.
[24] N. Kumari, A. Yadav, and P. K. Jana, “Task offloading in fog computing : A survey of algorithms and
optimization techniques,” Comput. Networks, vol. 214, no. June, p. 109137, 2022, doi:
10.1016/j.comnet.2022.109137.
[25] M. Dolly Nithisha, B. Divya Sri, P. Lekhya Sahithi, and M. Suneetha, Unfair Review Detection on Amazon
Reviews Using Sentiment Analysis, vol. 853. Springer Singapore, 2022.
[26] A. P. Rodrigues et al., “Real-Time Twitter Spam Detection and Sentiment Analysis using Machine Learning
and Deep Learning Techniques,” Comput. Intell. Neurosci., vol. 2022, 2022, doi: 10.1155/2022/5211949.
[27] S. Girgis and M. Gadallah, “Deep learning algorithms for detecting fake new in online text,” pp. 93–97, 2018.
[28] J. Abdul, O. Subhani, and I. Varlamis, “International Journal of Information Management Data Insights Fake
news detection : A hybrid CNN-RNN based deep learning approach,” vol. 1, no. December 2020, 2021, doi:
10.1016/j.jjimei.2020.100007.
[29] Y. Jian, X. Chen, and H. W. B, Deep Neural Networks with Hybrid Feature Fusion Method. Springer
International Publishing, 2022.
[30] S. Lin, “Fake Reviews Detection with Hybrid Features Using Time-Sequential Deep Learning Model,” pp. 3–5.

2023 Ijsem-147259

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2023 Ijsem-147259

Uploaded by

Copyright:

Available Formats

Enhanced accuracy of detecting fraudulent product reviews

using a fusion machine learning approach

These techniques can be used individually or in combination to identify fake reviews. It is

Table 1 Comparison of various work

Author Dataset Algorithm used Outcome Findings Advantages

3.1.1 Pre-processing steps

Angles Los New post times york

3.1.6 Validation Technique

4.1 Experimental Setup

4.2 Dataset Description (Web Scrapping)

4.3 Performance Parameters

• Precision = True Positive / (True Positive + False Positive)

Figure. 5 Confusion Matrix

Classifier Accuracy Precision Recall F1-Score Time (sec)

VALIDATION TECHNIQUE : Train –Test Split (80% - 20%)

PERFORMANCE PARAMTERS COMPARISON GRAPH ON KAGGLE

Classifier Accuracy Precision Recall F1-Score Time (sec)

VALIDATION TECHNIQUE : 10 Fold Cross Validation (K-fold)

PERFORMANCE PARAMTERS COMPARISON ON KAGGLE DATASET

Classifier Accuracy Precision Recall F1-Score Time (sec)

VALIDATION TECHNIQUE : Train –Test Split (80% - 20%)

PERFORMANCE PARAMTERS COMPARISON GRAPH ON AMAZON

Classifier Accuracy Precision Recall F1-Score Time (sec)

VALIDATION TECHNIQUE : 10 Fold Cross Validation (K-fold)

5. CONCLUSION AND FUTURE WORK

You might also like