You are on page 1of 6

IEEE - 45670

Extensive Survey on Feature Extraction and


Feature Selection Techniques for Sentiment
Classification in Social Media
S.Sathish Kumar1 Dr.Aruchamy Rajini2
1Research Scholar, 2 Assistant Professor,

Department of Computer Science, Department of Computer Science,


Hindustan College of Arts and Science, NGM College,Pollachi 624001
Coimbatore 641028. Tamil nadu.
ABSTRACT - Data Mining is a process of generating new wrapper technique dependency of features are
information from the existing datasets involving the considered for training and testing in feature space
machine learning, statistics and database systems. Major [6]. Embedded methods works better than wrapper
role of data mining is to turn raw data into useful
method and integrate feature selection process into
information by using mathematical algorithms for
predicting future events. Data mining is known as training model process. Feature extraction is
knowledge discovery in data (KDD).The key properties of classified into linear and non-linear. Issues of pattern
data mining are automatic discovery of patterns, recognition are an important part of feature
prediction of likely outcomes, creation of actionable extraction. Learning mechanisms is classified into
information and focuses on large datasets and database. supervised, unsupervised and reinforcement learning
Mining the information related to a content-oriented
[12].
extraction on the basics of feature of data. For accessing
the features of data that is data retrieval can be done by To build a classifier model, feature classifications
feature extraction mechanism. Various feature extraction use the available resources such as Filter approach,
methods have been used along with a feature selection Wrapper approach and embedded approach. Feature
algorithm for mining the large datasets. extraction and feature selection is a process of
reducing the inputs for processing and analyzing the
Keywords: Knowledge Discovery, Feature Selection inputs [3]. More number of features will not be
Algorithm, Data Retrieval informative, due to presence of irrelevant information
or redundant features in a class. Relevant and non-
redundant feature is to be extracted and selected for
I. INTRODUCTION effective and efficient classification. Feature
In machine learning feature selection is the process selection methods is divided into lexicon based
of selecting a subset of related features, termed as methods and statistical methods [1]. The terms are
variable selection, attribute selection for building the eliminated and features are selected from the list of
model. A selection of features is independent of any vocabulary features identified, among them are
machine learning algorithms residing within the data. information gain (IG), CHI-Statistic (CHI), Gain
The evolution of features can be done using Ratio (GR), Document Frequency (DF) and Relief-F
dependent and independent measures [5]. The [2].Residual part of the paper is planned as follows.
dependent measures of features subset are evaluated Section II provides related works. In section III,
and performance of algorithm is monitored. provides proposed sentimental classification model
Evaluation is done without algorithms for Section IV provides concludes the paper with future
independent measures. The different types of process.
independent measures are distance, information,
dependency and consistency.[15] The feature II.RELATED WORKS
selection removes the redundant data using feature MondherBouazizi and TomoakiOhtsuki [14] 2019,
selection algorithms involving filter wrapper and had proposed sentiment analysis refers to the
embedded techniques [7]. automatic collection, aggregation, and classification
Filter technique is classified into multivariate and of data collected online into different emotion
univariate without considering the classifiers. In classes. The proposed approach of multi-class

10th ICCCNT 2019


July 6 -8, 2019 - IIT, Kanpur,
Kanpur, India
IEEE - 45670

classification achieves an accuracy of 60.2% for 7 Classification algorithm such as Naïve Bayes
different sentiment classes which, compared to an (NB), K-Nearest Neighbor (KNN), Logistic Model
accuracy of 81.3% for binary classification, Tree (LMT) and Radial Basis Function (RBF) in text
emphasizes the effect of having multiple classes. classification are compared. Interaction between
Hai Ha Do et al. [15] 2019, had anticipated current feature subset search and model selection in wrapper
research focus for sentiment analysis was the based approach results in high performance than filter
improvement of granularity at aspect level, based approach. The selection of classification
representing two distinct aims: aspect extraction and algorithm in a wrapper based approach decides the
sentiment classification of product reviews and accuracy of classifier. For choosing the best
sentiment classification of target-dependent tweets. classification algorithm to work with feature
Deep learning approaches have emerged as a selection algorithms in a wrapper based approach
prospect for achieving these aims with their ability to plays a vital role for sentiment classification.
capture both syntactic and semantic features of text Categorization of content into polarity levels such as
without requirements for high-level feature positive, negative, and neutral is known as sentiment
engineering, as was the case in earlier methods. In classification.
that article, they aim to provide a comparative review
of deep learning for aspect-based sentiment analysis 3.1 DATA PREPROCESSING
to place different approaches in context. Data preprocessing is a data
Bowen Zhang et al. [16] 2019, had suggested mining technique that involves transforming
sentiment analysis was an important task in natural raw data into an understandable format. Real
language processing. Previous studies have shown world data is often incomplete, inconsistent and
that integrating the knowledge rules into lacking in certain behaviors or trends and likely to
conventional classifiers can effectively improve the contain many errors. Data preprocessing includes
sentiment analysis accuracy. A critic learning based  Cleaning
convolutional neural network, which can address the  Integration
two shortcomings. Our method was composed of  Transformation
three key parts, a feature-based predictor, a rule-  Reduction
based predictor and a critic learning network. Cleaning process fills in the missing values
Extensive experiments are carried out, and the results in data and identifies outliers. Cleaning also smooth
show that the proposed method achieves better outs the noisy data. The Integration uses multiple
performance than state-of-the-art methods in databases and files. Transformation is a process
sentiment analysis. involving normalization, aggregation and
AsadAbdi et al. [17] 2019, had proposed sentiment generalization. The data are reduced in attribute
analysis concerns the study of opinions expressed in numbers is reduction [4].
a text. They present a deep-learning-based method to
classify a user's opinion expressed in reviews (called 3.2 FEATURE EXTRACTION
RNSA).The RNSA employs the Recurrent Neural Feature extraction consists of transforming
Network (RNN) which was composed by Long arbitrary data, such as text or images, into numerical
Short-Term Memory (LSTM) to take advantage of features which is usable for machine learning [12]. It
sequential processing and overcome several flaws in efficiently represents interesting parts of an image as
traditional methods, where order and information a compact feature vector starts from initial set of
about the word are vanished. measured data and builds derived values by
Shiyang Liao et al. [18] 2017, had planned an dimensionality reduction.
approach to understand situations in the real world
with the sentiment analysis of Twitter data base on 3.2.1 Types of Stemmer
deep learning techniques. Recently, deep learning A stem is a natural group of words with equal (or
was able to solve problems in computer vision or very similar) meaning. This method describes the
voice recognition, and convolution neural network base of particular word. Inflectional and derivational
(CNN) works well for image analysis and image stemming are two types of method [8]. The stemming
classification. The result shows that it achieves better algorithms can be classified as follows,
accuracy performance in twitter sentiment Truncating [1. Lovins 2. Porters 3. Paice/Husk 4.
classification than some of traditional method such as Dawson]
the SVM and Naive Bayes methods. Statistical [1. N-Gram 2. HMM 3. YASS]

10th ICCCNT 2019


July 6 -8, 2019 - IIT, Kanpur,
Kanpur, India
IEEE - 45670

Mixed [1.Inflectional & Derivational a) Krovetz b) independent of all other word occurrences for the
Xerox 2. Corpus Based 3. Context Sensitive] unigram posits [11] [13]. The document generation
Steming: process as a sequence of dice rolls with a fixed
1 probability of occurrence associated with each word.
CJ= ---- The product of the word probabilities provides the
chance of observing a given document.
|CJ| Σdi … (1) P(wi ∣ w1…wi−1 )≈P(wi )= c (wi ) ∑w̃ c (w̃ ) … (3)
di is the document vector in the set Cj 3.2.5.2 Bigram
j is the number of documents in Cluster Cj. Every bigram’s frequency distribution in a
string is commonly used for simple statistical
3.2.2 Stop Words Removal analysis of text in many applications. This includes in
The major forms of pre-processing are to filter out computational linguistics, cryptography, speech
useless data. The useless words (data) are referred to recognition, and so on.
as stop words in natural language processing [16]. 3.2.5.3 Trigram
The usual words like a, an, but, and, of, the etc. is The Cryptanalytic frequency analysis has found 16
removed while indexing the entries. common character-level trigrams in English. Context
is very important for the varying analysis of rankings
3.2.3 Tokenization and percentages, which are easily derived by drawing
The given document is considered as a string and from different sample sizes, different document
identifying single word in document i.e. the given types: poetry, science-fiction, technology
document string is divided into one unit or token, that documentation and writing levels.
has no extrinsic or exploitable meaning or value [9].
Through a tokenization system, the token is a FEATURE SELECTION
reference (i.e. identifier) that maps back to the Feature selection differs from dimensionality
sensitive data. Using tokens created from random reduction, but these methods seek to reduce the
numbers, original data gets mapped to token using number of attributes in the dataset [15].
methods which render tokens infeasible to reverse in Dimensionality reduction method is for creating new
the absence of the tokenization system. combinations of attributes, where as feature selection
methods include and exclude attributes present in the
3.2.4 Normalization data without changing them. Feature selection is used
Normalization divides the larger tables into smaller for Enabling the machine learning algorithm to train
tables and links them using relationships [14]. The faster. Reduces the complexity of a model and makes
series of restructuring a relational database into a it easier to interpret. Improves the accuracy of a
normal form, in order to reduce data redundancy model if the right subset is chosen. Reduces over
improves data integrity. Repeated storage of same fitting. Filter methods are generally used as a
information leads to update anomaly problem, which preprocessing step. The selection of features is
can be overcome with the help of normalization independent of any machine learning algorithms.
process. Instead, features are selected on the basis of their
scores in various statistical tests for their correlation
X new = x −µ with the outcome variable. In wrapper methods, a
-------- subset of features and train a model using them.
O … (2) Based on the inferences that are draw from the
3.2.5 N-Gram previous model, subsets are added or removed. These
N-gram model sequences of natural languages methods are usually computationally very expensive
utilize the statistical properties of n-grams. The n- [12]. Embedded methods combine the qualities’ of
items contiguous sequence for given sample of text filter and wrapper methods. It’s implemented by
or speech is called as n-grams or shingles [10]. Based algorithms that have their own built-in feature
on the size of the n-gram, is classified as, selection methods.
 Unigram (Size 1)
 Bigram or Diagram(Size 2) 3.3.1 Chi Square (CHI)
 Trigram (Size 3) The Chi Squared Statistic (CHI) measures
3.2.5.1 Unigram the association between the word feature and its
Each word occurrence in a document is associated class or category [15]. CHI as a common

10th ICCCNT 2019


July 6 -8, 2019 - IIT, Kanpur,
Kanpur, India
IEEE - 45670

statistical test represents divergence from the III PROPOSED SENTIMENTAL


distribution expected based on the assumption that CLASSIFICATON MODEL
the feature occurrence is perfectly independent of the Sentiment classification is a special task of
class value. text classification whose objective is to classify a text
according to the sentimental polarities of opinions it
contains favorable or unfavorable, positive or
negative. Sentimental analysis is used to improve the
… (4)
accuracy of the classifiers for business decisions
3.3.2 Information Gain (IG)
made to improve the business. Statements received
IG targets at the removal of features that
from web consist of words which process
produce less information based on the presence and
classification model. Classification algorithms are
absence of terms in the document. Information gain is
used for enhancing business in e-commerce sites and
utilized as a feature goodness criterion in machine
where the accuracy of the classifiers has to be
learning based classification [12]. IG measures
increased. Meta heuristic algorithms can be used in
information obtained for class prediction of an
selecting the features to enhance the accuracy of the
arbitrary text document by evaluating the presence or
classifiers [12].
absence of a feature in the text document
Classification algorithm such as Naïve Bayes
(NB), K-Nearest Neighbor (KNN), Logistic Model
… (5) Tree (LMT) and Radial Basis Function (RBF) in text
Where, classification are compared. Interaction between
R- response value. feature subset search and model selection in wrapper
|T| is the number of observations based approach results in high performance than filter
4.3 Odds Ratio (OR) based approach. The selection of classification
Odds ratio measures the odds of the word algorithm in a wrapper based approach decides the
occurring in the positive class normalized by that of accuracy of classifier. For choosing the best
the negative class [5]. The basic idea is that the classification algorithm to work with feature
distribution of features on the relevant documents is selection algorithms in a wrapper based approach
different from the distribution of features on the non plays a vital role for sentiment classification.
relevant documents. Categorization of content into polarity levels such as
Odds ratio PG1(1-PG1) …….(6) positive, negative, and neutral is known as sentiment
=PG /2 (1 − PG) 2 classification.
3.4 Sequential Minimal Optimization (SMO) Naïve Bayes algorithm for sentimental analysis is a
Sequential Minimal Optimization trains a process of finding that, a document can be classified
support vector machine requires the solution of a as a positive or a negative sentiment. KNN is quite
very large quadratic programming (QP) optimization different from the other classification algorithms
problem. SMO breaks this large QP problem into a [11]. Instead of training a model by using the
series of smallest possible QP problems [18][14]. available data set, this model uses lazy learning
These small QP problems are solved analytically, approach. This algorithm will not generalize the
which avoids using a time-consuming numerical QP model before a query is made. Based on the input
optimization as an inner loop. The amount of query, the model will form a specific model.
memory required for SMO is linear in the training set
size, which allows SMO to handle very large training
sets. Because matrix computation is avoided, SMO
scales somewhere between linear and quadratic in the
training set size for various test problems, while the
standard chunking SVM algorithm scales somewhere
between linear and cubic in the training set size.
SMO’s computation time is dominated by SVM
evaluation, hence SMO is fastest for linear SVMs and
sparse data sets. On real- world sparse data sets,
SMO can be more than 1000 times faster than the
chunking algorithm [14]. Fig. 1. Diagrammatic representation of accuracies
for Naïve Bayes and KNN.14

10th ICCCNT 2019


July 6 -8, 2019 - IIT, Kanpur,
Kanpur, India
IEEE - 45670

Classification involves the process of extracting [3]Avinash Chandra Pandey, Dharmveer Singh
the features in the given statement and classifying the Rajpoot and MukeshSaraswat,"Twitter sentiment
input statement based on the polarity of features analysis using hybrid cuckoo search
extracted. Once the input reviews are classified, the method",Information Processing and Management,
classification accuracy of the different algorithms is Vol.53, pp.764-779, 2017.
measured by comparing the actual sentiment of the [4]Neha Singh, Nirmalya Roy and
reviews with the classified sentiment [13]. Feature AryyaGangopadhyay,"Analyzing The Emotions of
extraction is a process in data mining that involves Crowd For Improving The Emergency Response
the steps for reducing the amount of resources Services",Pervasive and Mobile Computing, Vol.58,
required to describe a large set of data. Major pp.1-33, 2019.
problem in mining and analysis of a complex data is [5]Chae Won Park and DaeRyongSeo,"Sentiment
availability of large number of attributes in the data Analysis of Twitter Corpus Related to Artificial
set. Applying feature extraction techniques to the Intelligence Assistants",International Conference on
data set before it is given as input to the classifier Industrial Engineering and Applications, pp.495-498,
results in improving the accuracy of the classifier 2018.
model [10]. [6]KashfiaSailunaz and RedaAlhajj,"Emotion and
In Figure 1, the NB and also KNN classification Sentiment Analysis from Twitter Text",Journal of
technical on considering accuracy are compared. Computational Science, pp.1-42, 2019
From this above figure for 100 training dataset, it is [7]AnkushChatterjee, Umang Gupta, Manoj
perceived that the Naïve bayes technique offer 56.78 Kumar Chinnakotla, RadhakrishnanSrikanth, Michel
accuracy, KNN techniqueoffer47.64.Similarly,for all
Galley and PuneetAgrawal,"Understanding emotions
the training dataset, the accuracy is varied. Hence,
from the Figure 1 it is oblivious that the NB give the in text using deep learning and big data",Computers
better accuracy. in Human Behavior, Vol.93, pp.309-317, 2019.
[8]RavinderAhuja, Aakarsha Chug, ShrutiKohli,
IV CONCLUSION Shaurya Gupta and PratyushAhuja,"The Impact of
Efficient sentimental classification models the Features Extraction on the Sentiment
algorithms used at the step of feature selection plays Analysis",Procedia Computer Science, Vol.152,
an important role. Process involved in feature pp.341-348, 2019.
selection improves the overall accuracy of the [9]M. Ghiassi and S. Lee,"A Domain Transferable
classifier. Features elected based on the mathematical Lexicon Set for Twitter Sentiment AnalysisUsing a
formulas are easily implemented and used with a Supervised Machine Learning Approach",Expert
classifier. The increasing demand sentimental Systems With Applications, Vol.106, pp.197-216,
analysis is to improve the accuracy of the classifiers 2018.
based on which important business decisions can be [10]FazeelAbid, Muhammad Alam, Muhammad
made to improve the business. In general, for feature Yasir and Chen Li,"Sentiment analysis through
selection the list of features are not apparently fixed. recurrent variants latterly on convolutional neural
Statements or tweets received from web consist of network of Twitter",Future Generation Computer
perplexing words which makes process of Systems, Vol.95, pp.292-308, 2019.
classification more difficult. To make use of the [11]Eric S.Tellez, Sabino Miranda-Jiménez, Mario
classification algorithms for enhancing business in e- Graff, Daniela Moctezuma, Oscar S.Siordia and Elio
commerce sites the accuracy of the classifiers has to A.Villasenor,"A case study of Spanish text
be increased. transformations for twitter sentiment
analysis",ExpertSystems with Applications, Vol.81,
REFERENCES pp.457-471, 2017.
[1]EmaKusen and Mark Strembeck,"Something [12]PragyaTripathi, Santosh Kr Vishwakarma and
draws near, I can feel it: An analysis of human and Ajay Lala,"Sentiment Analysis of English Tweets
bot emotion-exchange motifs on Twitter",Online Using RapidMiner",International Conference on
Social Networks and Media, Vol.10-11, pp.1-17, Computational Intelligence and Communication
2019. Networks, pp.668-672, 2015.
[2]KiichiTago and QunJin,"Influence Analysis of [13]Ahmed Sulaiman M Alharbi and Elise
Emotional Behaviors and User Relationships Based Donckerde,"Twitter Sentiment Analysis with a Deep
on Twitter Data",TSINGHUAScience and Neural Network: An Enhanced Approach using User
Technology, Vol.23, No.1, pp.104-113, 2018. Behavioral Information",Cognitive Systems

10th ICCCNT 2019


July 6 -8, 2019 - IIT, Kanpur,
Kanpur, India
IEEE - 45670

Research, Vol.54, pp.50-61, 2019. convolutional neural networks with


[14] MondherBouazizi and rules",Neurocomputing, Vol.356, pp.21-30, 2019.
TomoakiOhtsuki,"Multi-Class Sentiment Analysis on [17] AsadAbdi, SitiMariyamShamsuddin,
Twitter: Classification Performance and ShafaatunnurHasan and JalilPiran,"Deep learning-
Challenges",Big Data Mining and Analytics, Vol.2, based sentiment classification of evaluative text
No.3, pp.181-194, 2019. based on Multi-feature fusion",Information
[15] Hai Ha Do, PWC Prasad, Angelika Maag and Processing & Management, Vol.56, No.4, pp.1245-
AbeerAlsadoon,"Deep Learning for Aspect-Based 1259, 2019.
Sentiment Analysis: A Comparative Review",Expert [18] Shiyang Liao, Junbo Wang, Ruiyun Yu,
Systems with Applications, Vol.118, pp.272-299, Koichi Sato and ZixueCheng,"CNN for situations
2019. understanding based on sentiment analysis of twitter
[16]Bowen Zhang, XiaofeiXu, Xutao Li, Xiaojun data",Procedia Computer Science, Vol.111, pp.376-
Chen, Yunming Ye and ZhongjieWang,"Sentiment 381, 2017.
analysis through critic learning for optimizing

10th ICCCNT 2019


July 6 -8, 2019 - IIT, Kanpur,
Kanpur, India

You might also like