You are on page 1of 5

Multilabel Aspect-Based Sentiment Classification

for Abilify Drug User Review


Ashok Kumar J Abirami S Tina Esther Trueman
Information Science and Technology Information Science and Technology Information Science and Technology
Anna University Anna University Anna University
Chennai, India Chennai, India Chennai, India
jashokkumar83@auist.net abirami@auist.net tina_trueman@auist.net

Abstract—Multilabel text classification plays an important radically changed. Now I am extremely patient, but I miss
role in text mining applications such as sentiment analysis and my old self as an enterprising person. Anyway, my family is
health informatics. In this paper, we propose a multilabel thankful I am under treatment so I can manage my life more
2019 11th International Conference on Advanced Computing (ICoAC) 978-1-7281-5286-8/20/$31.00 ©2020 IEEE 10.1109/ICoAC48765.2019.246871

aspect-based sentiment classification model for Abilify drug wisely.” associated with five-star ratings on effectiveness,
user reviews. First, we employ preprocessing techniques to ease of use, and satisfaction. These aspect ratings labeled as
obtain the quality of data. Second, the term frequency-inverse positive sentiment and negative sentiment for multilabel
document frequency (TF-IDF) features are extracted with Bag classification tasks. Moreover, drug user data contains a
of words (BoWs). Third, a joint feature selection (JFS) method patient condition, age, gender, treatment year, comment, and
with Information Gain (IG) is applied to select label specific
a set of aspects. The data deals with hundreds or thousands
features and label sharing features. Moreover, multilabel
classification task can be solved using the problem
of word features [5]. Therefore, we apply a joint feature
transformation approaches, adapted algorithm approaches, selection (JFS) method using Information Gain (IG) to select
and ensemble approaches. Finally, we study the problem a label specific and label sharing features. Three main
transformation approaches, binary relevance (BR), classifier approaches used to solve the multilabel classification
chains (CC), and label Powerset (LP) to classify Abilify user problem, namely, the problem transformation, adapted
reviews into a set of aspect term sentiment (ATS). The baseline algorithm, and ensemble approaches. This paper investigates
classifiers Naïve Bayes (NB), decision tree (DT), and support the problem transformation approach for multilabel aspect-
vector machine (SVM) is employed on both feature sets. The based classification.
proposed method evaluated on multilabel metrics such as
accuracy, Hamming Loss, F1-micro averaged, and accuracy
The rest of this paper is organized as follows. Section II
per Label. The empirical results show that the support vector presents the related tasks in multilabel classification. Section
machine outperforms. III describes the proposed multilabel aspect rating
classification for drug reviews. Multilabel evaluation
Keywords—Multilabel classification, Aspect sentiment metrics: example-based metrics, label-based metrics, and
ratings, Binary relevance, classifier chains, Label powerset, Drug rank-based metrics are explained in Section IV. Section V
reviews. presents the results and discussion. Section VI concludes this
paper with future work.
I. INTRODUCTION
Online social media produces a large amount of opinion II. RELATED TASKS
based text on various domains such as products, movies, Researchers’ have shown much interest in multilabel
politics, and health care. Sentiment classification determines classification techniques. These techniques classify an event
a given text as a positive or negative sentiment [1-2]. This into a set of labels. Generally, multilabel classification
text helps to identify users’ interest in buying a product, research deals with text, image, audio, and video.
watching a movie, or taking medicine. In this study, we focus Specifically, the researchers focused on the application of
the online health information services on drugs and
sentiment classification [1-2], [13-15], text categorization
medications. In particular, we considered Abilify Oral (also
called Aripiprazole) user reviews. Abilify oral is used to treat [3], multimedia, biology, chemical analysis, social network
mood disorders and severe mood swings such as mining, medical diagnosis, image classification [17],
schizophrenia, bipolar disorder, and irritability. It restores nonintrusive load monitoring [7], fault detection in
certain chemical balances in the brain to improve induction motor [6], and blood pressure monitoring [9]. In
concentration. Abilify oral can be prescribed to various particular, Liu et al. proposed a multilabel sentiment
patients under different age groups. The drug reaction also classification approach for microblogs. This method
varies from person to person. Based on this reaction, drug empirically studies text with three sentiment lexicons using
users and caregivers are expressing their opinion about drugs the bag-of-words representation [2]. Lee et al. studied a
and their medications in the form of text, reviews, posts, and multilabel memetic feature selection for text using label
comments. Reviews are associated with specific star ratings
ranging from one to five stars. The star ratings indicate very frequency difference. This study indicated that the memetic
bad, bad, ok, good, and very good for drugs effectiveness, feature selection provides an optimal solution [3]. Qiu et al.
ease of use, and satisfaction. Generally, sentiment developed a framework to predict ratings for non-rated text
classification performed on either binary class or multi-class. reviews. The authors determined the ratings based on
aspects and their corresponding sentiments [4].
In this paper, we study a multilabel classification
Moreover, Rahmawati et al. investigated the problem
problem for drug reviews and their association with a set of
transformation and adapted algorithm approach for
aspect ratings. Multilabel classification techniques classify
Indonesian news articles. This study achieves the best
an instance or observation into a set of aspect labels. For
performance using the combination of TF-IDF feature
instance, the drug users review “I used to be extremely
selection, uncertainty feature selection, label ranking, and
impatient, irritable, and had frequent anger bursts. After I
support vector machine (SVM) algorithm [5]. Georgoulas et
started Abilify 5mg together with Bupropion 300mg, I

978-1-7281-5286-8/19/$31.00 ©2019 IEEE 376

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on October 01,2021 at 03:49:19 UTC from IEEE Xplore. Restrictions apply.
al. used a multilabel framework for detecting fault conditions form or stem words using Snowball stemming. For instance,
in an induction motor [6]. Furthermore, Huang et al. the words “associations” and “associated” are stemmed into
proposed a joint feature selection (JFS) method for “associ.”
multilabel classification. This method learns both label-
specific features and shared features [7]. This method C. Feature Representation
inspired to apply in the proposed work. Afzaal et al. Feature representation generates the preprocessed data
proposed a multi-aspect opinion classification model for text into machine-readable attributes and values. In this paper, we
reviews. The authors generated N-Grams and POS tag used TF-IDF (Term frequency-inverse document frequency)
features for multilabel classification [15]. Douibi et al. to weight a word in the document [13]. The word is weighted
applied a multilabel classification approach to analyze and based on two scores, TF and IDF. TF counts the frequency of
improve the patient ambulatory blood pressure monitoring a word in the document. IDF increases the weight of rarely
[9]. Kashef et al. studied the perspective of multilabel feature occurred words and decreases the weight of the most
selections [10]. Qu et al. predicted user review ratings using frequent words in a set of documents. The TF-IDF is defined
the bag-of-opinion methods [11]. Pang et al. addressed mathematically as in equation (1).
multiclass and regression versions for the rating-inference
problem, which infers similar items with similar labels [12] N
using the meta-algorithm. The existing research focused on wi ,d tf u idf tfi ,d u log( ) (1)
dfi
multi-aspect rating classification in the form of multiclass
problem. None of the above authors discusses text reviews Where tfi,d refers to the number of occurrence of a word i
with a set of aspect ratings in the multilabel environment. in the given text document d, N refers to the total number of
Therefore, we propose a multilabel aspect rating documents in the corpus, and dfi refers to the number of
classification model for drug reviews. documents that occurs word i.

III. A MULTILABEL ASPECT-BASED CLASSIFICATION D. Joint Feature Selection


The proposed multilabel aspect rating classification Feature selection is one of the most important techniques
model consists of six components, data acquisition, pre- in the field of machine learning. It increases the classification
processing, feature representation, feature selection, and model performance and reduces the model complexity and
problem transformation methods for multilabel overfitting. Generally, feature selection techniques classified
classification. In the problem transformation methods, Naïve into three categories, namely, filter methods, wrapper
methods, and embedded methods [7], [12]. Filter methods
Bayes (NB), decision tree (DT), and support vector machine
select a subset of features based on their correlation
(SVM) employed for multilabel classification. The coefficient. Wrapper method selects a subset of features on
individual components explain to the proposed model as training a model. The embedded method selects a subset of
follows. features by combining filter and wrapper methods. In this
A. Data Acquisition paper, we employ a joint feature selection (JFS) method for
multilabel learning. This method selects a subset based on
We acquired the data using a Beautiful Soup python label specific features and labels shared features [7]. In
package from an online health and medical information particular, we use Information Gain (IG) to select join
service provider, called WebMD. Drug users rated their features. The IG measures feature relevance of the single-
experience in the scale of one to five stars for drug’s label data in filter methods as shown in equation (2).
effectiveness, ease of use, and satisfaction. In total, 1722
instances obtained for Abilify Oral, which includes l l
condition, gender, treatment duration, patient type (patient or IG ( a, c) ¦ p(c) log p(c)  p(a) ¦ p(c | a) log( c | a) (2)
i 1 i 1
caregiver), ratings, and text reviews. For example, the review
“This drug changed my life. Helped me so much! I must say I Where p(c) refers to the probability occurrence of class c;
do not know where I would be without it.” is rated as 5-star, p(a) refers to the probability occurrence of the attribute a;
3-star, and 4-star for the drug’s effectiveness, ease of use, p(c/a) denotes the join probability occurrence of class c and
and satisfaction, respectively. These ratings used for attribute a. For multilabel learning, a subset of features
multilabel classification. The rating scales considered into selected for all labels individually using IG with a ranking
five different scales: very bad (1), bad (2), ok (3), good (4), threshold. Then, all label specific features combined to create
and very good (5). If the reviews are rated greater than or a new low dimensional subset. This subset includes label
equal to three stars, then each instance is labeled as one for specific features and label sharing features. For example, let
the effectiveness, ease of use, and satisfaction respectively. (x1, x2, x3, Ă, xn) be a set of features and (y1, y2, y3) be a set of
Otherwise, the instances are labeled as zero. labels for each instance in the high dimensional feature
space. Let (x2, x4, x8, x10), (x4, x5, x10), and (x3, x4, x9) be the
B. Pre-processing obtained label specific features using IG for y1, y2, and y3
The pre-processing method employed to prepare the respectively. Then, these individual features combined to
quality of data as follows. First, the extracted instances are create a new low dimensional feature space (x2, x3, x4, x5, x8,
converted from an upper case to lower case letters. Second, x9, x10).
the punctuation marks (!,",#,$,%,&,',(),*, …) and stop words
(a, an, the, …) are removed from the data. Third, a E. Problem Transformation Approach
tokenization method used to split the data into individual Notations for multilabel and variables: Let X = RD be the
words. Finally, the inflected words are reduced to their base input space and X = {0,1}L be the targets. Then, the
multilabel training data are defined as in equation [3].

377

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on October 01,2021 at 03:49:19 UTC from IEEE Xplore. Restrictions apply.
D {( x(i) , y (i) )} F. Baseline Classifiers
or
In this subsection, we briefly describe the baseline
ª x (1) x2
(1)
/ x D º» ª« y1
(1) (1)
y2
(1)
/ y L º»
(1) classifiers, NB, DT, and SVM [5], [10], [15-16]. NB
« 1
« (2) (2) (2) » « (2) (2) (2) » classifier is a probabilistic model. It assigns a class label to
D « x1 x2 / x D » « y1 y2 / yL » (3)
« 0 0 2 0 »« 0 0 2 0 »
an instance, which is represented as independent feature
« (N ) (N ) (N ) » « (N ) (N ) (N ) » vectors. DT classifier chooses features or attributes of
x x2 / x D » « y1 y2 / yL »
¬« 1 ¼¬ ¼ instances using IG that splits the set of instances into one
Where X ę [x1, x2, …, xD] = x(i) refers to the feature class label or other. SVM method can be used for
representation of an instance and Y ę [y1, y2, …, yD] = y(i) classification and regression problems. The SVM classifier
refers to the target labels for an instance. If the target label j separates categories into different classes using maximum
belongs to an instance, then yj=1, otherwise yj= 0. Multilabel margin hyperplane in the feature space.
classification problem can be solved using the transformation
approach, adaptation approach, and ensemble approach [14- IV. EVALUATION METRICS
15]. In this section, we only focus on the problem In this section, we discuss three types of evaluation
transformation approach to classify multi-aspect ratings for metrics used for multilabel classification, namely, example-
drug reviews. The problem transformation approach based metrics, label-based metrics, rank-based metrics, and
transforms the multilabel problem into single label problems. dataset metrics [10], [15-19]. These metrics measure the
The single label problems are classified using three machine performance and effectiveness of the classification
learning algorithms: Naïve Bayes (NB), Decision Tree (DT), algorithms. Let D = {(x(i), y(i))} be a set of text documents
and Support Vector Machine (SVM). Moreover, the problem and true labels. Let H be a multilabel classification algorithm
transformation approach can be studied in three different and z(i) = H(x(i)) be the set of predicted labels for the text
ways, namely, binary relevance, classifier chains, and label document x(i).
powerset. The individual component of this module is A. Accuracy:
discussed as follows. Accuracy measures the average of the proportional ratio
1) Binary relevance (BR): BR is the most simple and between the union and intersection of the predicted labels
and actual labels over all instances as in equation (4).
effective method for multilabel classification. It trains a
(i ) (i )
single classifier for each label independently. Let (X,Y) be a 1 D | y ˆz |
set of training examples with a set of targets, where X ę [x1, Accuracy( H , D ) ¦ (4)
D i 1 | y (i ) ‰ z (i ) |
x2, x3, …, xn] be is the independent features and Y ę [y1, y2,
y3, …, yn] is the target labels. In the BR method, the problem B. Hamming Loss
is transformed into single label binary problems (X, y1), (X, Hamming loss measures the average of misclassified
y2). (X, y3), and so on. Then, each problem is learned by the labels between predicted labels and actual labels over all
class information [6], [15], 16]. instances as in equation (5).
2) Classifier Chains (CC): CC method transforms the (i ) ( i )
multilabel classification problem into a binary problem in 1 D | y 'z |
Hamming loss ( H , D) ¦ (or) (5)
the form of chains. It trains the first classifier on the input Di 1 L
data points and then trains each subsequent classifier on the
( i ) (i )
input space in the chain. Let (X,Y) be independent features 1 D XOR ( y ' z )
and targets. Then, the CC transforms multilabel data into Hamming loss ( H , D) ¦
Di 1 L
single labels such as, (X, y1), ([X, y1], y2), ([X, y1, y2], y3), and
([X, y1, y2 , y3], y3). It is more similar to the BR method. Where refers to the number of labels and refers to
However, the CC method forms a chain of order to maintain the difference between predicted labels and actual labels, The
label correlation [15], [16]. lower value of the Hamming loss is the better predictive
model.
3) Label Powerset (LP): LP transforms the multilabel
problem into a single label multiclass problem. Then, the C. F1-micro Averaged
multiclass classification algorithms can be performed on the F1-micro averaged score is calculated by combining
identified unique label combinations. For example, the micro-averaged precision score and micro-averaged recall
independent features x1, x2, x3, and x4 that contains the score over all samples as in equation (6).
multilabe y1 = [1,0,1], y2 = [1,1,0], y3 = [1,0,1], and y4 =
[1,0,0] for four instances x(1), x(2) , x(3), and x(4) respectively. D. Accuracy per Label
The LP method transforms the independent features x1, x2, Accuracy per label calculates the accuracy for all text
x3, and x4 into the multiclass classification problem such as documents separately to obtain the averaged score value per
(x(1), P), (x(2), Q), (x(3), P), and (x(4), R). In particular, the label, as in equation (7).
training instances x(1) and x(3) shares the same set of labels
TP  TN
where [1,0,1] denotes the positive label for the drug’s Accuracyper label (7)
effectiveness and satisfaction and the negative label for the TP  FP  FN  TN
ease of use [5], [16].

378

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on October 01,2021 at 03:49:19 UTC from IEEE Xplore. Restrictions apply.
TABLE I
CLASSIFIERS PERFORMANCE FOR BINARY RELEVANCE METHOD
APL
Methods A HL F1
0 1 2
NB 0.552 0.379 0.715 0.605 0.689 0.570
DT 0.596 0.343 0.739 0.619 0.797 0.553
SVM 0.607 0.338 0.752 0.626 0.783 0.578
NB_IG 0.621 0.324 0.767 0.644 0.777 0.607
DT_IG 0.653 0.305 0.787 0.662 0.836 0.586
SVM_IG 0.698 0.270 0.823 0.707 0.842 0.641
A-Accuracy, HL – Hamming loss, F1 – Micro averaged score by label,
APL – Accuracy per label

TABLE II
CLASSIFIERS PERFORMANCE FOR CLASSIFIER CHAINS METHOD
APL
Methods A HL F1
0 1 2
NB 0.550 0.378 0.715 0.605 0.687 0.575
DT 0.610 0.347 0.746 0.619 0.783 0.557
SVM 0.605 0.346 0.750 0.626 0.775 0.562
NB_IG 0.606 0.336 0.759 0.644 0.739 0.609
DT_IG 0.663 0.312 0.792 0.662 0.819 0.584 Fig. 1. Classifiers performance for the problem transformation approach.
SVM_IG 0.703 0.274 0.825 0.707 0.839 0.633
and LP method, as shown in Fig.1. The SVM classifier
A-Accuracy, HL – Hamming loss, F1 – Micro averaged score by label,
APL – Accuracy per label outperforms than DT and NB for both feature sets (with and
without feature selection). In particular, the BR method
TABLE III achieves low hamming loss (27.0%). The F1-micro averaged
CLASSIFIERS PERFORMANCE FOR LABEL POWERSET METHOD (82.5%) achieves higher accuracy in the CC method.
APL Moreover, the LP method has taken less time to build and
Methods A HL F1
0 1 2
test the model.
NB 0.541 0.385 0.694 0.566 0.720 0.557
DT 0.603 0.354 0.745 0.609 0.777 0.551 VI. CONCLUSION
SVM 0.628 0.334 0.765 0.634 0.798 0.564
NB_IG 0.606 0.331 0.765 0.635 0.766 0.607 Multilabel classification in social media contents has vast
DT_IG 0.672 0.306 0.799 0.669 0.825 0.587 applications. The basic problem in multilabel classification is
SVM_IG 0.704 0.280 0.823 0.706 0.840 0.614 to detect a set of labels. In this paper, we proposed a
A-Accuracy, HL – Hamming loss, F1 – Micro averaged score by label, multilabel aspect rating classification approach for drug users
APL – Accuracy per label and their comments. The existing research restricted only
with text, but this paper includes drug users’ age group,
¦iq 1 2TP gender, condition, treatment duration, and comments with a
F1  Micro averaged (6)
q q q set of aspect ratings. In particular, the problem
¦i 1 2TP  ¦i 1 FP  ¦i 1 FN
transformation approaches adapted to detect a user’s
sentiment rating on different targets. The experimental
V. EXPERIMENTAL RESULTS results show that the SVM achieves better performance with
the JFS method. In the future, we intend to predict patients’
We performed the multilabel aspect rating classification
condition with big data.
for Abilify oral user reviews. In total, 1722 user data
obtained from WebMD using Python with the Beautiful
Soup library. These data were labeled as effectiveness (1160 REFERENCES
positives and 562 negatives), ease of use (1446 positives and [1] Majumder, N., Poria, S., Peng, H., Chhaya, N., Cambria, E., &
276 negatives), and satisfaction (971 positives and 751 Gelbukh, A. (2019). Sentiment and Sarcasm Classification with
negatives) based on user ratings. The preprocessing Multitask Learning. arXiv preprint arXiv:1901.08014.
techniques applied to obtain the quality of data such as the [2] Liu, S. M., & Chen, J. H. (2015). A multi-label classification based
conversion of the upper case to lower case, punctuation and approach for sentiment classification. Expert Systems with
Applications, 42(3), 1083-1093.
stop word removal, tokenization, and Snowball stemming.
[3] Lee, J., Yu, I., Park, J., & Kim, D. W. (2019). Memetic Feature
We then extracted 4043 TF-IDF features from all documents Selection for Multilabel Text Categorization using Label Frequency
for multilabel classification. Moreover, we selected 229 TF- Difference. Information Sciences.
IDF features using the IG method. It includes label specific [4] Qiu, J., Liu, C., Li, Y., & Lin, Z. (2018). Leveraging sentiment
features and label sharing features. Furthermore, the problem analysis at the aspects level to predict ratings of reviews. Information
transformation approaches (BR, CC, and LP) adapted to Sciences, 451, 295-309.
perform multilabel classification. The baseline classifiers, [5] Rahmawati, D., & Khodra, M. L. (2015, August). Automatic
namely, NB, DT, and SVM, are employed on both feature multilabel classification for Indonesian news articles. In 2015 2nd
International Conference on Advanced Informatics: Concepts, Theory
sets using the MEKA tool [20]. It is an extension of WEKA and Applications (ICAICTA) (pp. 1-6). IEEE.
for multilabel and multi-target classification. The [6] Georgoulas, G., Climente-Alarcon, V., Antonino-Daviu, J. A.,
performance of multilabel system evaluated on the gold Tsoumas, I. P., Stylios, C. D., Arkkio, A., & Nikolakopoulos, G.
standard tenfold cross-validation using example-based (2017). The use of a multilabel classification framework for the
metrics, label-based metrics, and rank-based metrics. Table detection of broken bars and mixed eccentricity faults based on the
1, 2, and 3 shows the performance of multilabel aspect rating start-up transient. IEEE Transactions on Industrial Informatics, 13(2),
625-634.
classification. The classifier results visualized for BR, CC,

379

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on October 01,2021 at 03:49:19 UTC from IEEE Xplore. Restrictions apply.
[7] Huang, J., Li, G., Huang, Q., & Wu, X. (2018). Joint feature selection [13] Kim, Y., & Zhang, O. (2014). Credibility adjusted term frequency: A
and classification for multilabel learning. IEEE transactions on supervised term weighting scheme for sentiment analysis and text
cybernetics, 48(3), 876-889. classification. arXiv preprint arXiv:1405.3518.
[8] Read, J. https://users.ics.aalto.fi/jesse/talks/Multilabel-Part01.pdf. [14] Pereira, R. B., Plastino, A., Zadrozny, B., & Merschmann, L. H.
Multi-label Classification. (2015). Information gain feature selection for multi-label
[9] Douibi, K., Settouti, N., Chikh, M. A., Read, J., & Benabid, M. M. classification. Journal of Information and Data Management, 6(1), 48.
(2018). An analysis of ambulatory blood pressure monitoring using [15] Afzaal, M., Usman, M., Fong, A. C., & Fong, S. (2019).
multi-label classification. Australasian physical & engineering MultiaspectǦ based opinion classification model for tourist
sciences in medicine, 1-17. reviews. Expert Systems, e12371.
[10] Kashef, S., NezamabadiǦ pour, H., & Nikpour, B. (2018). Multilabel [16] Zhang, M. L., & Zhou, Z. H. (2014). A review on multi-label learning
feature selection: A comprehensive review and guiding algorithms. IEEE transactions on knowledge and data
experiments. Wiley Interdisciplinary Reviews: Data Mining and engineering, 26(8), 1819-1837.
Knowledge Discovery, 8(2), e1240. [17] Gibaja, E., & Ventura, S. (2015). A tutorial on multilabel
[11] Qu, L., Ifrim, G., & Weikum, G. (2010, August). The bag-of-opinions learning. ACM Computing Surveys (CSUR), 47(3), 52.
method for review rating prediction from sparse text patterns. [18] Tsoumakas, G., Katakis, I., & Vlahavas, I. (2009). Mining multi-label
In Proceedings of the 23rd international conference on computational data. In Data mining and knowledge discovery handbook (pp. 667-
linguistics (pp. 913-921). Association for Computational Linguistics. 685). Springer, Boston, MA.
[12] Pang, B., & Lee, L. (2005, June). Seeing stars: Exploiting class [19] Wu, X. Z., & Zhou, Z. H. (2017, August). A unified view of multi-
relationships for sentiment categorization with respect to rating label performance measures. In Proceedings of the 34th International
scales. In Proceedings of the 43rd annual meeting on association for Conference on Machine Learning-Volume 70 (pp. 3780-3788).
computational linguistics (pp. 115-124). Association for JMLR. org.
Computational Linguistics.
[20] Read, J., Reutemann, P., Pfahringer, B., & Holmes, G. (2016). Meka:
a multi-label/multi-target extension to weka. The Journal of Machine
Learning Research, 17(1), 667-671.

380

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on October 01,2021 at 03:49:19 UTC from IEEE Xplore. Restrictions apply.

You might also like