You are on page 1of 5

Exploring the Ensemble of Classifiers for Sentimental

Analysis - A Systematic Literature Review


Ali Athar, Wasi Haider Butt, Muhammad Waseem Anwar, Muhammad Latif, Farooque Azam
Department of Computer Engineering, College of E&ME
National University of Sciences and Technology (NUST), H-12, Islamabad, Pakistan
ali.athar14@ce.ceme.edu.pk, wasi@ceme.nust.edu.pk, waseemanwar@ceme.nust.edu.pk,
mlatif@ceme.nust.edu.pk, farooq@ceme.nust.edu.pk

ABSTRACT techniques are introduced [1]-[3] to unify the desired classifiers


Text classification is a well-known machine learning approach to for the improvement of the classification accuracy. Moreover,
simplify the domain-specific investigation. Therefore, it is various classifiers are studied to examine their applicability in the
commonly utilized in the field of sentimental analysis to achieve sentimental analysis like Support Vector Machine (SVM) etc.
the particular business goals. Different ensemble approaches are Furthermore, a lot of publically available datasets are provided to
frequently introduced to unify the desired classifiers for the explore and tune a particular sentimental classification approach.
improvement of sentimental classification. However, to the best of Although the aspects of sentimental analysis like ensemble
our knowledge, no study is available yet that investigate and approaches, classifiers and datasets etc. are well researched
summarize the leading ensemble approaches, classifiers, features, individually, no study is available yet to the best of our knowledge
tools and datasets altogether in the domain of sentimental analysis. that investigate and summarize the leading ensemble approaches,
Therefore, in this paper, a Systematic Literature Review (SLR) is classifiers, features, tools and datasets altogether. Therefore, in
performed to identify 31 studies published during 2008-2016. this paper, SLR is performed to find the solutions of following
Subsequently, 14 modern ensemble techniques, 26 leading four questions:
classifiers, 15 benchmark datasets, 19 prominent features and 8 RQ 1: What are the leading research studies during 2008-2016
tools are presented in the context of sentimental analysis. This where SVM is ensemble with the other classifiers for sentimental
investigation certainly benefits the scholars and industrial experts analysis?
of the domain while deciding the right choices according to the
given requirements.  RQ1.1: what are the prominent ensemble approaches for
sentimental analysis?
CCS Concepts  RQ1.2: What are the major classifiers, ensemble with the
• Computing methodologies ➝Ensemble methods • Computing SVM, for sentimental analysis?
methodologies➝Machine learning approaches • Computing RQ 2: What are the leading exploited features for sentimental
methodologies➝Machine learning algorithms analysis during 2008-2016?

Keywords RQ 3: What are the significant datasets, utilized during 2008-


Sentimental analysis; classifiers ensemble; sentimental datasets; 2016, for sentimental analysis?
sentimental features. RQ4: What are the frequently used tools to perform sentimental
analysis during 2008-2016?
1. INTRODUCTION
Evolution of web technologies leads to the existence of bulky
subjective info that need to be classify into different sentiment
orientation types (e.g. positive, negative, and neutral etc.) for the
fulfilment of particular business requirements. Sentimental
analysis involves the automated classification of a huge subjective
data into the classes of concern by utilizing the concepts of
machine learning like data mining and Natural Language
Processing (NLP) etc. As sentimental analysis is required in
various business domains, it is a frequently researched to perform
the certain improvements. For example, various ensemble

Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. Copyrights for
components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to
post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from Permissions@acm.org. Figure 1. Research overview
ICMLC’17, February 24–26, 2017, Singapore, Singapore
© 2017 ACM. ISBN 978-1-4503-4817-1/17/02…$15.00 The summary of research is shown in Figure 1. The review
DOI: http://dx.doi.org/10.1145/3055635.3056601 protocol is developed (Section 2.1) to select 31 studies as per

410
selection / rejection criteria (Section 2.1.1). Subsequently, 14 selected study should be based on solid evidences. 2) The
ensemble techniques (Section 3.1), 26 classifiers (Section 3.4), 15 objective of this research is to include most recent researches.
datasets (Section 3.5), 19 features (Section 3.2) and 8 tools Therefore, we try our best to include most recent researches as
(Section 3.3) are presented. The limitations and answers of RQ’s 67% researches are from
are provided in Section 4. Finally, conclusion is given in Section 5.

2. METHODOLOGY Summary of selected studies


A Systematic Literature Review (SLR) [4] is used to perform this per year
study.
8
2.1 Review Protocol Development
6
2.1.1 Selection and Rejection Criteria
The selection and rejection criteria is defined to accomplish this 4
SLR with following four rules:
1. Only study dealing with sentimental classification should be 2
selected. Furthermore, study should use ensemble method
where SVM classifier should be used as a base learner along 0
with other one or more classifiers.
2. The selected study should be published during 2008-2016.
3. Selected study should be published in any one of the four
scientific repositories i.e. IEEE, Springer, ACM and Elsevier.
Year
4. The selected study should have decisive positive impact
concerning sentimental classification through ensemble Figure 2. Number of selected studies per year
approach. Furthermore, selected study should be support by the 2013 to 2016 and overall 83% researches are included from 2011
solid facts and experiments. to 2016 as show in Figure 2. 3) Originality of researches is also
very important. Therefore, we have selected all the researches
2.1.2 Search Process from four well know and authentic scientific databases i.e. IEEE,
We use different search terms like sentimental classification, SPRINGER, ACM and ELSVIER.
ensemble, SVM etc. to attain required search results. The
summary is given in Table 1. 2.1.4 Data Extraction and Synthesis
We define a complete template to extract and analyze the relevant
Table 1. Search terms and results data of selected studies as shown in Table 2. We analyze the
Number of Search Results extracted date to identify the features, ensemble technique,
Search Term classifiers, datasets and tools from each selected study. In addition,
IEEE ACM Elsevier Springer we also extract the basic information from each selected study for
Sentimental the accomplishment of investigation.
6788 5980 7605 4781
classification
Table 2. Data extraction template
Ensemble 5902 4763 6431 5943
Element Details
SVM 8930 9637 9329 7932
Author, Title, Publication year, Publisher
Sentimental Basic Info
3890 5631 5865 4734 and type (conference or journal)
Ensemble
SVM Data Extraction
4574 5950 7361 5037
Ensemble Outline The objective of selected study
Results The ultimate outcomes
We start searching by entering simple terms as shown in Table 1.
We use more meaningful terms like “majority voting sentimental” Assumptions Limitations of study
etc. after scanning the initially selected studies. However, we only Validation Experimental evaluation or other
include basic terms in Table 1 due to space limitations. The terms Data Synthesis
containing multiple words return huge search results, however, we Ensemble Ensemble technique used for sentimental
use different filters to limit the search results. Technique classification (Table 3)
The search process is completed through sequential steps. For Features selected from selected studies
Features
example, we discard multiple search results by reading title. (Table 4)
Similarly, a number of studies are rejected after reading abstract. Tools utilized for sentimental
Tools
Some studies are rejected after reading complete contents. We classification (Table 5)
finally selected 31 studies completely follow our selection and Classifiers used as a base learner along
rejection rules (Section 2.1.1). Classifiers
with SVM (Table 6)
2.1.3 Quality Assessment Datasets used for sentimental
Datasets
classification (Table 7)
We try to ensure the high quality of selected studies so that the
outcomes of this SLR are reliable: 1) the data evaluation of the

411
Table 3. Identified ensemble methods for sentimental classification
Study Ensemble Technique

Novel
BG Boosting Stacking FC RS AB Daggling MV WC MCC BRS MC SMOTE
App.

[1]    
[2]       
[3] 
[5]  
[6]   
[7]  
[8]  
[9]    
[10]  
[11] 
[12] 
[13]    
[14] 
[15] 
[16] 
[17] 
[18] 
[19] 
[20] 
[21]     
[22]  
[23]  
[24]  
[25]  
[26] 
[27] 
[28] 
[29] 
[20] 
[31] 
[32] 
Bagging (Bootstrap) =BG, Majority Voting=MV, AdaBoost=AB, Weighted Combination= WC, Meta Classifiers Combination= MCC,
Bagging random Space = BRS, Meta Cost =MC

412
3.4 Classifiers Utilized
3. RESULTS We find 26 classifiers that are frequently ensemble with SVM to
perform sentimental analysis as given in Table 6.
3.1 Ensemble Techniques
We analyze 31 selected studies and found 14 leading ensemble Table 6. Leading classifiers for sentimental analysis
approaches for sentimental analysis as given in Table 3.
Classifier Studies
3.2 Feature Selection RBF NN [29]
We identify 19 features, frequently used in the domain of Naïve Bayes [1],[2],[3],[5],[6],[8],[9],[10],
sentimental analysis, as given in. There are studies that utilized (NB) [15],[19],[20],[22],[26],[27],[29],[32]
more than one feature simultaneously for sentimental analysis as Decision trees [1],[9],[15],[19]
given the studies column of Table 4. BLR [2]
Table 4. Prominent features for sentimental analysis RF [3],[19]
CRF [5]
Sr.# Feature Name Studies
BPN [7]
1 Semantic [29], [9]
PNN [1],[10],[20],[23]
2 SentiWordNet [16], [29], [28]
K neighbors [1],[10],[20],[23]
3 Product attributes [17],
Max. Entropy [1],[5],[6],[12],[15],[18],[20],[24],[26],[32]
4 Word Pairs and
[2],[6],[20],[29],[31],[11],[3],[27] (LDA) [2],[7]
Word Relation
5 [1],[6],[7],[8],[10],[11][13],[14],[ (LR) [2],[3],[9],[23]
N-gram 15],[16],[18],[19],[20],[21],[22],[ CRF [11],[15]
23,[24],[25],[26],[28][30],[32] ANN [12]
6 POS [2],[6],[29] (HMM) [16]
7 Hashing [3],[10] BN [15],[19],[27],[29]
8 dependency (CB) [20]
[6]
relations (RBF) [23]
9 Length [24],[28],[11] MLP [23]
10 Polarity ELM [31]
[24],
Dictionary (BPNN) [31]
11 Abbreviation [24] Senti Strength [22]
12 Negation [24]
Scoring [32]
13 Stems [24]
SBC [14]
14 Clustering. [24]
15 ALLCAPS [24] (GIBC) [14]
16 VSM (Vector RBC [14]
[25]
Space Model)
17 Sentiment [9],[12] 3.5 Important Datasets
We identify 15 benchmark datasets for sentimental analysis as
18 Stylometric [9]
given in Table 7.
19 Numbers [10]
Table 7. Important datasets for sentimental analysis
3.3 Leading Tools
We identify 8 tools / frameworks that are commonly used to Open
Dataset Studies
Source
perform various tasks (e.g. pre-processing, training, classification
etc.) for sentimental analysis as shown in Table 5. Movie YES [01],[5], [6], [8], [18],[14] [25]
Twitter Yes [3],[12],[22],[23],[24],[28],[29],[19]
Table 5. Identified tools / framework
Product [1],[2],[5],[7],[21],[13],[10],[14],[20],
Tools Studies YES
Data [26],[30]
[1], [2], [3], [8], [13], [15], [17], [18], [19], Medical YES [1],[2],[27],[10]
WEKA
[20], [22], [23], [27], [29], [30] SMD YES [31]
Rapid miner [07],[09] B News No [11],[15]
E-Comer YES [17]
MATLAB [07],[31]
NTCIR YES [32]
YamCha [11] Poem YES [28]
LibSVM [6],[25],[26] Goog. Ad No [09]
Book YES [8]
Mallet2 tool [26]
Shopping YES [8]
SVMLight [22],[32] My Space YES [14]
Sentiment Montada YES [16]
[14]
analysis tool AFF YES [16]

413
4. RQ’s ANSWERS AND LIMITATIONS [13] G Vinodhini “A sampling based sentiment mining approach
The answer of RQ1 is provided in Table 3 and Table 6. for e-commerce applications” JIPM 2017, Vol 53, Issue 1,
Furthermore, the answers of RQ2, 3 & 4 are provided in Table 4, Pages 223–236
Table 7 and Table 5 respectively. Although we select four well- [14] Rudy Prabowo, Thelwall, “Sentiment analysis: A combined
known scientific repositories for this SLR, there are fair approach,” Informatics 2009, Vol 3, Issue 2, PP 143–157
probabilities that we miss few studies published in other [15] Sriparna Saha, Asif Ekbal, “Combining multiple classifiers
repositories (e.g. Wiley). However, this limitation does not using vote based classifier ensemble technique for named
majorly affect the ultimate results of this SLR due to the selection entity recognition” JD&KE 2013, Vol. 85, Pages 15–39
of high impact scientific databases. [16] Ahmed Abbasi, Member, Hsinchun Chen, FellowSven
Thoms, and Tianjun Fu, “Affect Analysis of Web Forums and
5. CONCLUSION AND FUTURE WORK Blogs Using Correlation Ensembles” IEEE Trans. On Knowl.
This study explores the modern sentimental analysis trends. A & Data Eng 2008, VOL. 20, NO. 9
Systematic Literature Review (SLR) is executed to identify 31 [17] G. Vinodhini and R. M. Chandrasekaran, “Sentiment Mining
studies published in 2008-2016. As a result, 14 modern ensemble Using SVM-Based Hybrid Classification Model,” Springer
techniques, 26 leading classifiers, 15 benchmark datasets, 19 2013, Volume 246 pp 155-162
prominent features and 8 tools are identified in the context of [18] Yumi Lin, Xiaoling Wang, Jingwei Zhang, Aoying Zhou,
sentimental analysis. Although the outcomes of the SLR are “Assembling the Optimal Sentiment Classifiers,” 13th
highly beneficial for the scholars and industrial experts, it is International Conference, Paphos, Cyprus, November 28-30,
essential to perform comparative analysis of identified ensemble 2012. Proceedings, vol. 7651, pp. 271-283, 2012.
approaches, features, classifiers, tools and datasets in order to [19] Yun Wan, Qigang Gao, “An Ensemble Sentiment
provide the in-depth details. We intend to perform such analysis in Classification System of Twitter Data for Airline Services
the next article. Analysis,” IEEE 15th Data Mining Workshops, 2015.
[20] Ying Su, Wang, Hongmiao “Ensemble Learning for
6. REFERENCES Sentiment Classification,” Spri. 2013, Vol. 7717, pp 84-93.
[1] G. Wang, et al., “Sentiment classification: The contribution [21] Mattew Whitehead, Larry Yaeger, “Sentiment mining using
of ensemble learning”, journal of Decision Support Systems, Ensemble classification model, “Springer B.V. 2010
2013, Volume 57, Pages 77–93 [22] Tawunrat Chalothorn, Jeremy Ellman, “Simple approaches
[2] Aytug Onan, Serdar glu, Hasan, A Multiobjective Weighted of sentiment analysis via ensemble learning” Springer-
Voting Ensemble Classifier Based on Differential Evolution Verlag Berlin Heidelberg , vol. 339, pp 631-639, 2015.
Algorithm for Text Sentiment Classification”, JESA 2016, [23] Joseph Prusa, Khoshgoftaar, Daivd J. Dittman ,“Using
Vol. 62, Pages 1–16. Ensemble Learners to Improve Classifier Performance on
[3] N.F.F. da Silva, et al., “Tweet sentiment analysis with Tweet Sentiment Data, ” IEEE 16th ICIRI 2015.
classifier ensembles”, Journal of Decision Support Systems”, [24] Matthias Hagen, Potthast, Büchner, Stein “Twitter Sentiment
Volume 66, October 2014, Pages 170–179 Detection via Ensemble Classification Using Averaged
[4] Kitchenham, Barbara. “Procedures for Performing Confidence Scores” Spr. 2015 pp. 741–754.
Systematic Reviews.” Keele, UK, Keele University 33.2004 [25] Lin Dai, Hechun Chen, Xuemei Li , “Improving Sentiment
(2004): 1-26. Classification Using Feature Highlighting and Feature
[5] Fersini, Messina, F. Pozzi, “Sentiment analysis: Bayesian Bagging,” 11th IEEE ICDMW 2011, Pages 61-66
Ensemble Learning” DSS 2014, Vol 68, Pages 26–38. [26] Zhongqing Wang, Li, Zhou, Peifeng, Zhu, “Imbalanced
[6] Rui Xiaa, Chengqing Zonga, Shoushan, “Ensemble of feature Sentiment Classification with Multi-Strategy Ensemble
sets and classification algorithms for sentiment Learning,” Proceedings Asian Language Processing, 2011.
classification”, JIS 2011, Vol 181, Is. 6, Pages 1138–1152 [27] Wenjia Wang, “Heterogeneous Bayesian Ensembles for
[7] G. Vinodhini, R.M. Chandrasekaran “A Comparative Classifying Spam Emails,” proceedings Neural Net., 2010.
Performance Evaluation of Neural network based approach [28] Vipin Kumar, Sonajharia Minz, “Multi-view Ensemble
for Sentiment Classification of Online Reviews” Journal of Learning for Poem Data Classification Using SentiWordNet,
King Saud University - Computer and Information Sciences “Advanced Computing and Informatics Proceedings of
2016, Vol 28, Issue 1, Pages 2–12. ICACNI 2014, vol. 27, pages 57-66.
[8] Cagatay Catal, Mehmet Nangir, “A Sentiment Classification [29] Ammar Hassan, Ahmed Abbasi, Daniel Zeng “Twitter
Model Based On Multiple Classifiers” Applied Soft Sentiment Analysis: A Bootstrap Ensemble Framework”,
Computing 2017 , Vol 50, Pages 135–141 International Conference on Social Computing, 2013.
[9] Michael A, Abrahams, T. Ragsdale “Ensemble learning [30] G. Vinodhini and R. M. Chandrasekaran, “Sentiment Mining
methods for pay-per-click campaign management” ESA Using SVM-Based Hybrid Classification Model”,
2015, Vol 42, Issue. 10, Pages 4818–4829. Computational Intelligence, Cyber Security and
[10] Johannes V. Lochter, Rafael F. Zanetti, Dominik Reller, Computational Models, Volume 246, pp 155-162, 2013.
Tiago A. Almeida “Short Text Opinion Detection using [31] Feng Wang, Yongquan Zhang, Qi Rao, Kangshun Li, H.
Ensemble of Classifiers and Semantic Indexing” ESA 2016, Zhang, “Exploring mutual information-based sentimental
Vol 62, Pages 243–249 analysis with kernel-based extreme learning machine for
[11] Asif Ekbal • Sriparna Saha, “Combining feature selection stock prediction, ” soft computing 2016, PP 1-13.
and classifier ensemble using a multi objective simulated [32] Bin Lu, Benjamin K. Tsou, “Combining a large sentiment
annealing approach: application to named entity lexicon and machine learning for subjectivity classification, ”
recognition” Soft comp. 2013, Volume 17, Issue 1, pp 1–16 Proceedings of the Ninth International Conference on
[12] Yaowei, Rao∗, Xueying Zhan, Huijun Chen, Maoquan Luo, Machine Learning and Cybernetics, 11-14 July 2010
Jian Yin, "Sentiment and emotion classification over noisy
labels” KBS 2016 Vol. 111, pp 207–216

414

You might also like