Professional Documents
Culture Documents
Keywords: Fake reviews, Fake Review Type-I Un-truthful review – the reviews
Detection Techniques, Feature Selection written are not based on genuine
and Classification. experiences of reviewers of using the
products or services. The review posted
I. INTRODUCTION are classified into two -hyper spam
(contain undeserving positive opinions
“What other people think” has always been about some target entities in order to
an important piece of information for most promote the entities) and defaming spam
of us during the decision-making process. (contains false negative opinions about
Presently, web users post their reviews and some other entities in order to damage
their reputations) [1].
978-1-5090-5778-8/17/$31.00©2017 IEEE
Type-II Review on brand only- comments growth, polarity generation and calculation
only on the brands or the manufacturers of of agreement value. Using these methods
the products. Some may be genuine, but the review is evaluated. The performance
are considered as spam as they are not of this model is also compared with other
targeted at specific products [1]. detection techniques. The product name,
reviewer name, review (text) and minsup
Type-III Non- Reviews – it is not a are taken as inputs and undergoes three
review, but has advertisements and other iteration to detect fake reviews.
irrelevant reviews containing no opinions
(e.g., questions, answers, and random Type–III used IR- based evaluation
text). A spammer may work individually, strategy. The number of times a reviewer
or knowingly or unknowingly work as a gives opposite opinion to other reviews of
member of a group [1]. same product is checked to detect a review
spam. The accuracy level of ICF++ is 6%
III. ADAPTED METHODS higher than the ICF method. The ICF++
method, detects fake review only in short
A. Language model and feature selection span of time. The accuracy difference
between ICF and ICF++ is only 6% [4].
This method detects Type-I and
Type-II review spam. C. Learning to Identify Review Spam
Type-I uses n-gram model to detect In this paper, one builds a review
duplicates by checking whether spam collection board on crawled
a. There is same review for same product reviews. They employ supervised
learning methods and analyze the effect
from different user ids of different features in review spam
b. There is same review for different identification. Based on the fact that the
products from same and different user ids spammer consistently writes spam, we
c. There is a same user id for different can identify if the author of the review is
products. spammer. To exploit the large amount of
unlabeled data, one provides a “two-
view-semi supervised methods”.
Type-II uses feature selection to detect
brand spam. The duplicates are observed The results show that two-view-Co-
with semantic language model. This model training algorithms can achieve better
has more efficiency and effectiveness for results than single-view algorithms. The
finding duplicates. This experiment results designed machine learning methods
company to improve their sales achieve certain improvement as
performance and customers to buy compared with the heuristic baselines. In
products [3]. future they planned to exploit the two
view algorithm and also to test co-
B. Modified method of iterative training algorithm in other opinion
computation framework resources such as blog or twitter [5].
This method is used for detecting D. Detection of Fake Review and Brand
Type-I, Type-II and Type-III of review Spam Using Data Mining Technique
spam.
This method detects Type-I and
Type-I and II are solved by the ICF++ Type-II review spam.
method- undergoes several stages like Pos
tagging, creation of transaction file, FP
a. The system first downloads (or crawls) The results demonstrated the superiority of
all the reviews, and put them in the review SMTL-LLR over various well established
database. baseline methods. But they have not
considered the truth review data for
b. The dataset contains many reviews microblogs or forum planners to
hence clustering algorithm is used for incorporate the model into a practical
clustering the reviews in the groups and opinion mining system [7].
individual text file ARFF is generated for
each group. F. Hybrid Optimization Algorithm of
improved Binary particle swarm
c. This ARFF file is given as a input to the Optimization (ibpso) and Cuckoo Search
classifier, and the classifier is used for for Review Spam Detection
detecting the reviews.
In this paper, they have implemented
d. After completing the process of hybrid approach of iBPSO and cuckoo
classification, fake and truthful reviews are search for feature subset selection. In
identified. These reviews now qualify for iBPSO, they have combined linearly
the further checking for Brand Spam decreasing inertia weight with
detection. convergence factor for exploration and
exploitation. After certain number of
This paper mainly focuses on review iterations, algorithm provides optimized
centric spam identification which provides feature subset. Binary cuckoo search is
greater focus on feedback content. As part used for feature selection. Classification
of future work, we can enhance review accuracy is considered as fitness function.
spammer identification into the review In testing, classification is done using
identification and vice versa. Exploring Naive Bayes and kNN algorithm.
different ways to learn behaviour patterns Experimental results show that hybrid
which are related to the spamming so as to approach outperforms well.
improve the accuracy of the current
regression model is an interesting research G. An Effective Hybrid Cuckoo Search
direction in current era [6]. with Harmony Search for Review Spam
Detection
E. Exploiting Task Relatedness and
Unlabeled Data In this work, they have used Type I
review. They have considered Ott et al[8]
In this paper they first developed a hotel review dataset.
multi-task learning method through
logistic regression (MTL-LR). This allows a.Preprocessing technique is applied to
to enhance the learning for one task by remove punctuation and stop words such
sharing the knowledge contained in the as article, preposition from the dataset.
training signals of other related tasks. To
influence graph Laplacian regularizer into b.Hybrid approach of cuckoo and harmony
each base model, and proposed a semi- search algorithm is applied for feature
supervised multitask learning model selection and Naive Bayes and kNN
through Laplacian regularised logistic classifier is used for classification.
regression (SMTL-LLR). They developed
a stochastic alternating method to deal The results show that hybrid algorithm
with the optimization problem. identifies good quality of feature subset
than the binary cuckoo search algorithm.
Type of Evaluation
S.no Dataset Techniques
Paper Title Review Metrics
used used
spam
Accuracy:
SMTL-LLR: 87.2%
MTL-LR: 85.2%
Online Review Spam Detection Restaurant Hotel Unigram,
Type-I, MTRL: 84.0%
A Using Language Model and and Doctor Bigram SMTL-
Type-II TSVM: 82.9%
Feature Selection Dataset LLR
LR: 82.1%
SVM: 81.8%
PU: 72.6%
Fake Review Detection From A
Type-I, Precision :
Product Review Using Modified ICF
B Type-II, ICF:57%
Method of Iterative Computation Amazon Dataset ICF++
Type-III ICF++:63%.
Framework
Precision,Recall and
Fmeasure
NB :0.517 0.669
0.583
NB-Bootstrapping:
Learning To Identify Review Unigram and
C Epinions Dataset Type-I 0.621 0.575 0.597 Co-
Spam Bigram
Training 0.630 0.589
0.609
Co-
Training(Agreement)
:0.641 0.621 0.631
Detection of Fake Review and Accuracy:
Type-I,
D Brand Spam Using Data Mining Amazon Dataset NLP Techniques Naive Bayes:96%
Type-II
Technique SVM:98%
Precision,Recall and
Fmeasure
BIGRAMS-SVM
:0.714 0.696 0.732
Deceptive Review Spam TRIGRAMS-SVM
Amazon Hotel Convolutional
E Detection via Exploiting Task Type-I 0.722 0.703 0.741
Review Dataset Neural Network
Relatedness and Unlabeled Data PWCC 0.749 0.741
0.759
Bagging 0.772 0.764
0.781
V CONCLUSION