A Survey On Online Review Spam Detection Techniques

1
IEEE International Conference on Innovations in Green Energy and Healthcare Technologies(ICIGEHT’17)
A SURVEY ON ONLINE REVIEW SPAM DETECTION

TECHNIQUES
1
SP.Rajamohana, 2Dr.K.Umamaheswari, 3M.Dharani, 4R.Vedackshya
1
Assistant Professor (Sr.Gr), 2Professor, 3UG Student, 4 UG Student
Department of Information Technology, PSG College of Technology, Coimbatore-4, India.
ABSTRACT opinions on websites, blogs, social media

like Facebook and Twitter, web-forums, e-
The development of Internet influenced commerce sites, RSS feeds etc. These
many of our day-to-day activities. opinions are helpful for both business
Ecommerce is one of the rapid growth organizations and individuals. This
areas in the Internet era. People are eager reviewing systems encourages some
to buy products from online sites like people to enter their fake review to
Amazon, ebay, Flipkart etc. Online sites promote some products or defame some
also provide facility for customers to write others [1]. Those people are called opinion
review on products they buy. These spammers and their actions are called
reviews help consumers and vendors for opinion spamming. The main reason for
making decision on marketing strategies, this action is to make more profit by
and the improvement of products and writing untrustworthy reviews and
services. Nowadays people are very much deceitful ratings .So to make products and
interested to read reviews before services trustable, these faked opinions
purchasing any product and getting must be detected and removed. Detection
services. This makes areas for opinion techniques are used to find fake reviews,
spammers to write fake reviews to promote fake reviewers and fake reviewer group
or to demote both products and business [2].
services. This type of activities is often
referred as Review spam. Thus detecting This survey paper exchange views about
fake reviews has become more critical various methodologies adapted for
issue for customers to make better decision detecting fake review and opinion
on purchase as well as the vendors to make spammers. The rest of the paper discusses
their products trustworthy. In this paper a about the types of reviews spam and
detailed survey is done using various accuracy level achieved using technology.
machine learning techniques for detecting
spam and genuine reviews. II. TYPES OF REVIEW SPAM
Keywords: Fake reviews, Fake Review Type-I Un-truthful review – the reviews
Detection Techniques, Feature Selection written are not based on genuine
and Classification. experiences of reviewers of using the
products or services. The review posted
I. INTRODUCTION are classified into two -hyper spam
(contain undeserving positive opinions
“What other people think” has always been about some target entities in order to
an important piece of information for most promote the entities) and defaming spam
of us during the decision-making process. (contains false negative opinions about
Presently, web users post their reviews and some other entities in order to damage
their reputations) [1].
978-1-5090-5778-8/17/$31.00©2017 IEEE
Department of BME & EEE, Dr.N.G.P. Institute of Technology

2
Type-II Review on brand only- comments growth, polarity generation and calculation
only on the brands or the manufacturers of of agreement value. Using these methods
the products. Some may be genuine, but the review is evaluated. The performance
are considered as spam as they are not of this model is also compared with other
targeted at specific products [1]. detection techniques. The product name,
reviewer name, review (text) and minsup
Type-III Non- Reviews – it is not a are taken as inputs and undergoes three
review, but has advertisements and other iteration to detect fake reviews.
irrelevant reviews containing no opinions
(e.g., questions, answers, and random Type–III used IR- based evaluation
text). A spammer may work individually, strategy. The number of times a reviewer
or knowingly or unknowingly work as a gives opposite opinion to other reviews of
member of a group [1]. same product is checked to detect a review
spam. The accuracy level of ICF++ is 6%
III. ADAPTED METHODS higher than the ICF method. The ICF++
method, detects fake review only in short
A. Language model and feature selection span of time. The accuracy difference
between ICF and ICF++ is only 6% [4].
This method detects Type-I and
Type-II review spam. C. Learning to Identify Review Spam
Type-I uses n-gram model to detect In this paper, one builds a review
duplicates by checking whether spam collection board on crawled
a. There is same review for same product reviews. They employ supervised
learning methods and analyze the effect
from different user ids of different features in review spam
b. There is same review for different identification. Based on the fact that the
products from same and different user ids spammer consistently writes spam, we
c. There is a same user id for different can identify if the author of the review is
products. spammer. To exploit the large amount of
unlabeled data, one provides a “two-
view-semi supervised methods”.
Type-II uses feature selection to detect
brand spam. The duplicates are observed The results show that two-view-Co-
with semantic language model. This model training algorithms can achieve better
has more efficiency and effectiveness for results than single-view algorithms. The
finding duplicates. This experiment results designed machine learning methods
company to improve their sales achieve certain improvement as
performance and customers to buy compared with the heuristic baselines. In
products [3]. future they planned to exploit the two
view algorithm and also to test co-
B. Modified method of iterative training algorithm in other opinion
computation framework resources such as blog or twitter [5].
This method is used for detecting D. Detection of Fake Review and Brand
Type-I, Type-II and Type-III of review Spam Using Data Mining Technique
spam.
This method detects Type-I and
Type-I and II are solved by the ICF++ Type-II review spam.
method- undergoes several stages like Pos
tagging, creation of transaction file, FP

3
a. The system first downloads (or crawls) The results demonstrated the superiority of
all the reviews, and put them in the review SMTL-LLR over various well established
database. baseline methods. But they have not
considered the truth review data for
b. The dataset contains many reviews microblogs or forum planners to
hence clustering algorithm is used for incorporate the model into a practical
clustering the reviews in the groups and opinion mining system [7].
individual text file ARFF is generated for
each group. F. Hybrid Optimization Algorithm of
improved Binary particle swarm
c. This ARFF file is given as a input to the Optimization (ibpso) and Cuckoo Search
classifier, and the classifier is used for for Review Spam Detection
detecting the reviews.
In this paper, they have implemented
d. After completing the process of hybrid approach of iBPSO and cuckoo
classification, fake and truthful reviews are search for feature subset selection. In
identified. These reviews now qualify for iBPSO, they have combined linearly
the further checking for Brand Spam decreasing inertia weight with
detection. convergence factor for exploration and
exploitation. After certain number of
This paper mainly focuses on review iterations, algorithm provides optimized
centric spam identification which provides feature subset. Binary cuckoo search is
greater focus on feedback content. As part used for feature selection. Classification
of future work, we can enhance review accuracy is considered as fitness function.
spammer identification into the review In testing, classification is done using
identification and vice versa. Exploring Naive Bayes and kNN algorithm.
different ways to learn behaviour patterns Experimental results show that hybrid
which are related to the spamming so as to approach outperforms well.
improve the accuracy of the current
regression model is an interesting research G. An Effective Hybrid Cuckoo Search
direction in current era [6]. with Harmony Search for Review Spam
Detection
E. Exploiting Task Relatedness and
Unlabeled Data In this work, they have used Type I
review. They have considered Ott et al[8]
In this paper they first developed a hotel review dataset.
multi-task learning method through
logistic regression (MTL-LR). This allows a.Preprocessing technique is applied to
to enhance the learning for one task by remove punctuation and stop words such
sharing the knowledge contained in the as article, preposition from the dataset.
training signals of other related tasks. To
influence graph Laplacian regularizer into b.Hybrid approach of cuckoo and harmony
each base model, and proposed a semi- search algorithm is applied for feature
supervised multitask learning model selection and Naive Bayes and kNN
through Laplacian regularised logistic classifier is used for classification.
regression (SMTL-LLR). They developed
a stochastic alternating method to deal The results show that hybrid algorithm
with the optimization problem. identifies good quality of feature subset
than the binary cuckoo search algorithm.

4
Hybrid approach achieves 91% accuracy

using Naive Bayes classifier.
Type of Evaluation
S.no Dataset Techniques
Paper Title Review Metrics
used used
spam
Accuracy:
SMTL-LLR: 87.2%
MTL-LR: 85.2%
Online Review Spam Detection Restaurant Hotel Unigram,
Type-I, MTRL: 84.0%
A Using Language Model and and Doctor Bigram SMTL-
Type-II TSVM: 82.9%
Feature Selection Dataset LLR
LR: 82.1%
SVM: 81.8%
PU: 72.6%
Fake Review Detection From A
Type-I, Precision :
Product Review Using Modified ICF
B Type-II, ICF:57%
Method of Iterative Computation Amazon Dataset ICF++
Type-III ICF++:63%.
Framework
Precision,Recall and
Fmeasure
NB :0.517 0.669
0.583
NB-Bootstrapping:
Learning To Identify Review Unigram and
C Epinions Dataset Type-I 0.621 0.575 0.597 Co-
Spam Bigram
Training 0.630 0.589
0.609
Co-
Training(Agreement)
:0.641 0.621 0.631
Detection of Fake Review and Accuracy:
Type-I,
D Brand Spam Using Data Mining Amazon Dataset NLP Techniques Naive Bayes:96%
Type-II
Technique SVM:98%
Precision,Recall and
Fmeasure
BIGRAMS-SVM
:0.714 0.696 0.732
Deceptive Review Spam TRIGRAMS-SVM
Amazon Hotel Convolutional
E Detection via Exploiting Task Type-I 0.722 0.703 0.741
Review Dataset Neural Network
Relatedness and Unlabeled Data PWCC 0.749 0.741
0.759
Bagging 0.772 0.764
0.781
Hybrid Optimization Algorithm

of Improved Binary Particle Feature Selection: Accuracy
Hotel Review
F Swarm Optimization (Ibpso) And Type-I iBPSO and Naive Bayes:96.97%
Dataset
Cuckoo Search for Review Spam Cuckoo Search kNN:88.23%
Detection
An Effective Hybrid Cuckoo
Search with Harmony Search for Feature Selection Accuracy
Hotel Review
G Review Spam Detection Type-I : Cuckoo with Naive Bayes:91.12%
Dataset
Harmony Search kNN:82.34%
Performance Analysis of Review Spam Detection Techniques –Table 1

5
V CONCLUSION
Deceptive reviews available in the

internet that increasingly affects businesses
and customers. Hence it is important to
detect and eliminate such fake reviews
from online websites. This paper reveals
several approaches used for review spam
detection and performance measures were
identified. This paper identified Type I,
Type II and Type III reviews. This topic
needs further research in Big Data
approach to reduce the number of features
and computational complexity which helps
to improve the detection methods, and also
consider other kinds of media such as
forums, blogs etc. Still it needs to be
exposed yet in this regard.
REFERENCES 6. Rashmi Gomatesh Adikel, Vivekanand

Reddy “Detection of Fake Review and
1. Daya Mevada, Viraj Daxini,“An opinion Brand Spam Using Data Mining
spam analyzer for product Reviews using Technique” International Journal of Recent
supervised machine Learning method”, Trends in Engineering & Research
Journal of Information, Knowledge And (IJRTER) , 2016.
Research In Computer Engineering,2015.
7. Zhen Hai, Peilin Zhao,Peng Cheng,
2. Nujud Aloshban, Hmood aldossari. “A Peng Yang_ Xiao-Li Li, Guangxia Li
new approach for group spam detection in “Deceptive Review Spam Detection via
social media for arabic language(AGSD)” Exploiting Task Relatedness and
The international conference on latest Unlabeled Data”, Proceedings of
trends in engineering and technology Conference on Empirical Methods in
(icltet'2016) ,2016. Natural Language Processing , 2016.
3. Manali S.Patil, A.M.Bagade,“Online 8. SP.Rajamohana, Dr.K.Umamaheswari,”
review spam detection using language Hybrid Optimization Algorithm of
model and feature selection” International improved Binary particle swarm
Journal of Computer Applications,2012. Optimization (iBPSO) and Cuckoo Search
for Review Spam Detection”, Proceedings
4. Eka Dyar Wahyuni and Arif of 9th International conference on Machine
Djunaidy.“Fake review detection from a learning and Computing,Singapore,2017.
product review using Modified method of
iterative computation framework” Web of 9. SP.Rajamohana, Dr.K.Umamaheswari,
Conferences ,MATEC ,2016. “An Effective Hybrid Cuckoo Search with
Harmony Search for Review Spam
5. Fangtao Li, Minlie Huang, Yi Yang and
Detection”,Proceedings of 3rd
Xiaoyan Zhu “Learning to Identify Review
International Conference on Advances in
Spam” ,Proceedings of the Twenty-Second
Electrical, Electronics, Information,
International Joint Conference on Artificial
Communication and Bio-Informatics
Intelligence,2011.
(AEEICB17),2017.

A Survey On Online Review Spam Detection Techniques

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Survey On Online Review Spam Detection Techniques

Uploaded by

Copyright:

Available Formats

1

IEEE International Conference on Innovations in Green Energy and Healthcare Technologies(ICIGEHT’17)