You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/335932404

Pairwise Reviews Ranking and Classification for Medicine E-Commerce


Application

Conference Paper · August 2019


DOI: 10.1109/IC3.2019.8844887

CITATIONS READS

2 788

3 authors:

Shaurya Uppal Ambikesh Jayal


Georgia Institute of Technology Cardiff Metropolitan University
3 PUBLICATIONS 4 CITATIONS 42 PUBLICATIONS 209 CITATIONS

SEE PROFILE SEE PROFILE

Anuja Arora
Jaypee Institute of Information Technology
108 PUBLICATIONS 1,578 CITATIONS

SEE PROFILE

All content following this page was uploaded by Shaurya Uppal on 13 October 2021.

The user has requested enhancement of the downloaded file.


Pairwise Reviews Ranking and Classification for
Medicine E-Commerce Application
Shaurya Uppal Ambikesh Jayal Anuja Arora
Computer Science Department Cardiff School of Technologies Computer Science Department
Jaypee Institute of Information Western Avenue Jaypee Institute of Information
Technology Cardiff, CF5 2YB Technology
Noida, India ajayal@cardiffmet.ac.uk Noida, India
shauryauppal00111@gmail.com anuja.arora29@gmail.com

Abstract— E-Commerce applications provide an added Whereas, all the ways discussed above are either based on
advantage to customer to buy product with added suggestions in statistical value provided by users or a sentiment score of
the form of reviews. Obviously, reviews are useful and impactful reviews. Due to the necessity of the market, many
for customers those are going to a buy product. But these improvements are made in this direction. Out of which
enormous amount of reviews create problem also for customers
relevance based review ranking is one of the modish
as they are not able to segregate useful ones. Therefore, there is a
need for an approach which will showcase only relevant reviews approaches. This relevance based review ranking method is
to the customers. This same problem has been attempted in this used by Google Maps. The commonly used reviews filtering
research paper as this is a less explored area. Pairwise Review method used by Google are- Newest, Highest rating, Lowest
relevance ranking method is proposed in this research paper. rating but relevance based review ranking is cutting-edge. One
This approach will sort reviews based on their relevance with the snapshot of Google Map reviews based on relevance is shown
product and avoid showing irrelevant reviews. This work has in Figure 1. Figure 1 shows reviews list based on relevance
been done in three phases- feature extraction, pairwise review mapping of content. Ranking is fundamentally a set of
ranking, and classification. The outcome is sorted list of reviews, instances by their relative relevance. This is useful for various
review ranking accuracy and classification accuracy. Four
applications such as recommendation, text mining, document
classifiers- SVM, Random forest, Neural network, and logistic
regression have been applied to validate ranking accuracy. Out retrieval, and text summarization.
of all four applied classification models, Random forest gives the
best result. our proposed system is able to achieve 99.76%
classification accuracy and 99.56% ranking accuracy for a
complete dataset using random forest.

Keywords—Pairwise Ranking; classification; Medicine; SVM,


Random Forest, Neural network, Logistic regression

I. INTRODUCTION
Nowadays, reviews ranking has become an extremely
challenging issue due to advancement in ecommerce sites.
Every product on ecommerce site has reviews and customer
who is going to buy that specific product will surely would
like to visit reviews once. But then the count of reviews
creates problem and leave a customer in a conflicting
condition.
To resolve this issue, Ecommerce sites have started
filtering reviews according to customers desire in order to gain
customer satisfaction while reading reviews. Some popular
ways to filter out reviews to satisfy customers are as follows
- Amazon Reviews filter according to ‘Top Reviews’
based on review helpfulness score and ‘Most Recent’ reviews
but on review post time Fig1. Google map Reviews ranking snapshot based on
relevance
- Flipkart has four ways to showcase reviews- Most There are three categories of ranking algorithms in
Helpful, Most Recent, Positive First, and Negative First literature- Pointwise Ranking, Pairwise Ranking [3], and
- Similarly with many other Ecommerce sites. Listwise Ranking [2]. Table I discusses about description and
output space of ranking algorithms lie under each category.

978-1-7281-3591-5/19/$31.00 ©2019 IEEE


TABLE I: Comparison of Pairwise, Pointwise, and - Identify features to quantitatively measure various
Listwise Ranking aspects of Reviews;
- Design and validate an Effective methodology to
Approach Description Output Space rank reviews;
- Figuring out, how and which reviews should be
Pointwise Choose a single Score or relevance filtered.
Ranking document and fit classes
Our proposed approach is elaborated in this research
relevance label
paper; first, features are extracted from collected reviews. To
individually.
extract features from reviews, various computing techniques
have been used, such as Natural Language Processing (NLP)
Pairwise Chooses a pair of (yes/no) for a given
and Information retrieval (IR) techniques. Then, pair wise
Ranking[3, documents and come document pair
ranking has been done in order to rank reviews based on
6] up with the optimal yes means: d1>d2 is
review relevance. In the end, reviews are classified in training
ordering for that pair true
and compare it to the no means: d1>d2 is and test set to validate ranking relevance outcome using four
ground truth. ( tells false classification algorithms- SVM, Random Forest, Neural
Network, and Logistic Regression.
d1>d2 )
The Paper is structured in the following manner- Section II
Listwise Chooses a list of Permutation - ranking discusses the work flow of the overall process to design a
Ranking[2] documents and try to of documents well-defined and high performance methodology. Feature
come up with optimal extraction in detail is discussed in section III. Section IV is
ordering for it. about the methodology used for Pairwise Ranking followed by
section V. Section V discussed Classification model which
Reviews/ implicit feedbacks given by customers showcase have been used to validate pairwise ranking outcome. Section
their preference towards items. Out of all existing solutions VI provides a detailed outcome of the performed experiment
defined above, the pairwise ranking technique has shown and finally concluding remark is in section VII.
state-of-art solutions. Pairwise ranking helps to compare II. OVERALL PROCESS FLOW
features of every review with all the reviews individually in
the dataset. The overall process flow of proposed approach is depicted
in figure 2. The overall process is divided into three phases-
Various recent researchers explored pairwise ranking Feature extraction, Pairwise Ranking, and Classification using
method for a number of applications. In 2018, Yu et. al. various base and ensemble classification model. Eight features
explored the multiple pairwise ranking algorithm to present are extracted using reviews in the feature extraction phase,
the difference among multiple pair of items [3]. This ranking abbreviations of features are written here in figure 2. Feature
scheme is used for Top-N item recommendation where it has extraction gives detailed information in section III.
been used to incorporate multiple types of user-item relations The second phase of the proposed approach is Pairwise
into a unified pairwise ranking model [4]. Pairwise review ranking, which is performed with the help of review
ranking has also been attempted by Bai et. al. in 2018. In their segregation and review ranking score computation. Method
work, characteristics of early reviewers are studied using their used for both purpose is elaborated in section IV and the last
posted reviews. Product lifetime is divided into three phases- section is classification section where the system will validate
early, majority, and laggards. The behavior of early viewers the performance of pairwise ranking approach. Four classifiers
has been analyzed [5]. The most competitive users are are using in the complete process which is mention in figure 2
classified as early viewer and the competition process is itself. The approach is feasible for all sort of review for any E-
further decomposed into multiple pairwise comparisons Commerce application. We have validated performance of the
between two players [5]. A deep online ranking system is approach on Medicine dataset which has been collected using
introduced by Yan et. al. for E-Commerce Recommendation varying Medicine E-Commerce applications for some specific
[6]. In this work, ranking module is designed using pairwise medicines. Results depict that Random Forest is the best
learning-to-rank deep neural network [6]. suitable model and gives maximum classification and ranking
A short Introduction of learning algorithm is published by accuracy.
Li. Maximum number of learning to rank algorithms are using
SVM technique. Number of techniques used in Pointwise,
pairwise, and listwise techniques are mentioned in this paper
along with examples of a few well known techniques used in
past. We have faced various challenges while applying
ranking techniques to real scenario E-Commerce application.
Some are as follows:
Fig 2. Process Flow Diagram
III. FEATURE EXTRACTION
Features extraction to cover all required properties/aspects
and to measure features in a quantitative manner is a much
needed task in order to achieve highly accurate outcome. TABLE II: Computed Review Polarity Sample
Hence, this section discusses about all the features extracted Review Polarity
from reviews.
it is not available regularly -0.1
1) Noun Strength (Rn) very Good 0.91
Nouns are subjects and considered as the most informative Delivery could be little faster otherwise 0.15625
part of a review [1]. Amount of subjects shows the importance okay
of review because only noun describes the prime factors of
It was delivered in a secured way 0
review (which tell the review is about what).
Therefore, part-of-speech tagging is used to define the 3) Review Subjectivity (Rs)
POS tag of words in a review. POS tagging is done with help
The subjectivity is a measure of the sentiment being
of a library spaCy1. This library helps to determine nous in a
objective to subjective and goes from 0 to 1. Objective
review. POS tags of a single product are shown in Figure 3.
expressions are facts while Subjective expressions are
Here, we are concerned only about words having NOUN POS
opinions that describe a person’s feeling. For example:
Tags.
Consider the following expressions
This banana is yellow: Objective
Banana tastes good: Subjective
Here, first expression is objective, while the second
expression is subjective as it expresses an opinion towards the
taste of the banana. It is also computed using TextBlob. Few
Review Subjectivity results are shown in Table III.
Subjectivity score of a single product are shown in Figure 4.
TABLE III: Computed Review Subjectivity Sample
Review Subjectivity
Don't want to say 0
Fig 3. POS Tags of all the reviews for a specific product It was good Bt i find it cheaper in 0.2
local medical shop.
Further, to validate Noun Strength, Term Frequency-
Amazing result in evion 400 0.9
Inverse Document Frequency (TF-IDF) approach is used. TF-
IDF score of noun tags helps to determine review importance. Delivery service and better expire 0.5
Equation 1 is used to compute the importance of words in a date
review (Rn). good product 0.6
Very good results 0.78
.....(1)

2) Review Polarity (Rp)


Polarity simply means emotions expressed in a sentence.
The polarity is the sentiment itself, ranging from -1 to +1,
where negative value explains a negative emotion, positive
value explains a positive emotion and 0 values means neutral.
TextBlob2 is a rule-based sentiment analyzer which helped in
determining polarity of reviews.
Review Polarity results of 4 sentences are shown in Table
II.

Fig 4.Subjectivity Score of reviews for a specific product


1
https://spacy.io/
2
https://textblob.readthedocs.io/
4) Review Complexity all (Rc1)
Overall review complexity is percentage of number of
unique words exist in a review based on overall unique words
in the corpus. So, to compute overall review complexity
equation 2 is used.

Rc1 = …(2)
Fig 5. Compound Score stats
Here corpus is total number of reviews for each product.
IV. PAIRWISE RANKING
5) Review complexity each (Rc2)
Ranking is a canonical problem and learning to rank as
Single review complexity is percentage of number of semi-supervised machine learning is a challenging issue in
unique words exist in a review based on total number of words current interactive web era. Undoubtedly, it is easier for
in that particular review. So, to compute each review humans to analyze and distinguish a set of reviews. Humans
complexity equation 3 is used. can easily assure that review is good and informative review,
or it is bad and uninformative. Whereas, on the other end to
Rc2 = …(3) rank these reviews is a troublesome task even for humans.
Therefore, to create a computing method that can rank reviews
6) Word Length (Rw): Number of Words count in a is handled in our proposed method. Pairwise ranking approach
Review. is applied to rank reviews in semi-supervised learning method.
7) Service Tagger (Rd): Pairwise ranking approach looks at a pair of documents at a
time in a loss function and predicts a relative ordering. The
Reviews are basically to describe product. So, a dictionary of objective is not to determine the relevance score but to find
words is created which would mark reviews as service based, which document is more relevant than other. This relevance is
delivery reviews, and customer support. developed to judge the preference of one review over another.
Fuzzy matching of every word in a review is done with the In semi-supervised learning method, mapping is constructed
words in the dictionary with Levenshtein distance less than 1. between input and output. This input-output pair in training
Levenshtein distance helps in measuring the difference model is used to learn the system.
between two sequences and tackle spell errors in review, for
example, despite of “My delivery was on time”, Reviews is Reviews Segregation: The reviews are segregated in two sets.
wrongly written as “My dilivery was on time”. In this case, Set 1 represents review with label 1 i.e reviews that are
Fuzzy matching would help us to match both the reviews. informative and are better than all reviews of Set 0;
Set 0 represents a review with label 0 i.e. that is not
informative. Even, reviews those are more subjective and
8) Compound Score (Rsc) discusses about delivery, service, and customer support but
To improve the efficiency of system, another compound score does not talk about the product also come in this set.
(Rsc) metric is used to determine sentiment of reviews. It is We pairwise compare each review of set1 with all reviews of
incorporated using VaderSentimentAnalyser3 library. The set0 and vice-versa.
system gives Rsc values in following range (Ri, Rj,1) where i∈Set1 and j∈Set0 → Ri is
better than Rj
Rsc value >= 0.5 Positive review
(Rj, Ri, 0) where i∈Set1 and j∈Set0 → Rj is
Rsc value <=-0.5 Negative review
-0.5< Rsc value < 0.5 Neutral worse than Ri
This now becomes a classification problem.
This library is taken from VADER (Valence Aware
Dictionary and sEntiment Reasoner). This is a lexicon and Review Score Computation: For a given product, we
rule-based sentiment analysis tool that is specifically tuned to compare each review (Ri) with every other review (Rj) for a
determine sentiments expressed in social media content. It has product and get a win/lose score where win means (Ri) is
ability to find sentiment of Slang (e.g. SUX! ) , Emoji (, ), better than (Rj) and lose means (Ri) is worse than (Rj).
Emoticons ( :), :D ) and difference between capitalized word Further, a review score is computed using equation 4
expressions also (I am HAPPY, I am happy are different
expressions). Review Score = …..(4)
Reviews are then sorted by this review score which results in
showing the most relevant review on top and most irrelevant
3
https://pypi.org/project/vaderSentiment/2.1/ review in the bottom.
V. CLASSIFICATION MODELS
Different classification models have different techniques to
fit data based on its properties which may lead to predicted
outcomes of models differ from each other and the actual
ground truth. Therefore, we have applied four classification
models to validate the best performance model for this
relevance based review ranking problem. The applied four
classification models are- Support vector classifier, Neural
Network, Logistic Regression, Random Forest Classifier.
Logistic Regression is basically a binary classifier and
transforms its output using the logistic sigmoid function. The
Logistic Sigmoid function returns the probability value which Fig 6. Dataset Statistics
is further mapped to binary class classification.
Using this model we get for any product reviews we can
Further, other nonlinear classifiers are applied and tested in
rank the reviews and get the most relevant reviews on top and
order to validate the pairwise ranking outcome. MultiLayer
push down irrelevant reviews of a product.
Perceptron Neural Network classifier is used. It uses multiple
layers (input layer, hidden layer and output layer) and a non- B. Performance measures
linear activation function in form of back propagation method The approach needs to be validated at two points. First-
of training of data. Support Vector Classifier (SVM) is also Ranking accuracy should be validated and Second-
tested that performs classification by finding the hyperplane to classification accuracy validation is needed. Undoubtedly,
differentiate the two classes. Finally, Random Forest Classifier system shows its efficiency in case both accuracy measures
which is an ensemble learning algorithm is used. This can be provide better results. Both accuracy measures are as follows:
used for both classification and regression. Random forests
creates decision trees on randomly selected data points, gets 1) Ranking Accuracy Measure
prediction from each tree and selects the best solution by A sorted list of reviews based on review score (computed
means of voting. using equation 4) is the outcome of pairwise Ranking
algorithm. Therefore, To test this hypothesis a ranking metric
VI. EXPERIMENTAL SETUP AND RESULTS is designed which is as follows:
The experimental setup, Dataset, and outcome of the
proposed approach are presented in this section in context of Let the number of reviews labeled as 1 in our Dataset be
performance evaluation. Nlabel=1.

A. Dataset Accuracy=
To perform experimental medicine reviews are used. The …….(5)
approach has been tested on various categories of dataset such
as Ecommerce site -Amazon product dataset, medicine review, 2) Classification Accuracy Measure: Classification
etc. and approach works effectively with all. We got some accuracy is computed using true positive, true negative, False
reviews labeled by different people just to know which types Positive, and False negative outcome. To validate it, equation
of reviews are liked by people. To showcase the efficiency of 6 is used
the approach, the reviews of a few medicinal category data4
ClassificationAccuracyreview= …….(6)
has been taken. The dataset statistics are showcased in Figure
6. Then ranking of reviews is a semi-supervised approach.
where TP- True Positive; TN- True Negative, FP-False
Here, we have taken training dataset which contains 503
Positive, and FN- false Negative.
number of reviews of 5 categories under medicine- Vitamin B
Tablet, Vitamin D Tablet, Accu-check, Omega 3 Fatty acid, C. Results
and a medicinal shampoo. Each category contains review in The sample resultant of pairwise review ranking and
83-107 range. classification are shown in figure 7 which depicts result after
experimenting with different models. Figure 7 shows sample
training data for Neurobion (Vitamin B) and Evion (Vitamin
E). Number of training and test set for both the cases is also
displayed in table. The system is validated on four models -
Random Forest, Neural network, Logistic Regression, and
Support vector machine. We found Random Forest Classifier
to be the best and giving classification accuracy is 99.76% and
ranking accuracy of 99.56% for complete dataset. Sorted
4
review list also along with review ranking score.
http://bit.ly/2UMHaSf
● Readability Score can be added in the future if
Personalized per user based ranking would be done.
● Extend the approach for same product on multiple E-
commerce applications.
REFERENCES
[1] Seki, Y. (2002). Sentence Extraction by tf/idf and position weighting
from Newspaper Articles.
[2] Ravikumar, P., Tewari, A., & Yang, E. (2011, June). On NDCG
consistency of listwise ranking methods. In Proceedings of the
Fourteenth International Conference on Artificial Intelligence and
Statistics (pp. 618-626).
[3] Yu, R., Zhang, Y., Ye, Y., Wu, L., Wang, C., Liu, Q., & Chen, E. (2018,
October). Multiple Pairwise Ranking with Implicit Feedback. In
Proceedings of the 27th ACM International Conference on Information
and Knowledge Management (pp. 1727-1730). ACM.
[4] Yu, L., Zhang, C., Pei, S., Sun, G., & Zhang, X. (2018, April).
Fig 7. Pairwise Ranking and Classification Results Walkranker: A unified pairwise ranking model with multiple relations
for item recommendation. In Thirty-Second AAAI Conference on
Artificial Intelligence.
A sample of top 5 sorted reviews is shown in Figure 8. [5] Bai, T., Zhao, W. X., He, Y., Nie, J. Y., & Wen, J. R. (2018).
Dataset along with all features and sorted list after ranking of Characterizing and predicting early reviewers for effective product
reviews for Vitamin E dataset is shown present at following marketing on e-commerce websites. IEEE Transactions on Knowledge
and Data Engineering, 30(12), 2271-2284.
url5.
[6] Yan, Y., Liu, Z., Zhao, M., Guo, W., Yan, W. P., & Bao, Y. (2018,
September). A Practical Deep Online Ranking System in E-commerce
Recommendation. In Joint European Conference on Machine Learning
and Knowledge Discovery in Databases (pp. 186-201). Springer, Cham.
[7] Li, H. (2011). A short introduction to learning to rank. IEICE
TRANSACTIONS on Information and Systems, 94(10), 1854-1862.

Fig 8. Sample Sorted review list along with score


VII. CONCLUSION AND FUTURE SCOPE
we have presented and empirically evaluated the relevance
based review ranking. We showed that it is possible to
compute review relevance ranking score with the help of
various text features and pairwise ranking techniques. Further,
a reviews relevance ranking based sorted review list can be
showcased. Ranking score and review rank has been validated
with the help of four classification models out of which
random forest gives best outcome. Direction for future work
includes:
● Gathering user upvotes data.
● Personalized ranking for every user based on his
activity, knowing what type of reviews and what type
of literature he/she likes. For users who don't upvote
reviews, we may use recommendation / collaborative
filtering technique and show similar users
personalized ranking of reviews.
● Personalized ranking would be further extension for
relevance ranking where we use upvotes per user and
learn which type of reviews a person likes and based
on that helpful upvote we will show the reviews on
top which have same features and relevance which
the person upvotes.

5
http://bit.ly/2PyUF29

View publication stats

You might also like