Professional Documents
Culture Documents
(ICITISEE)
Abstract—Social media is a media that many users need to platforms such as mobile devices, websites, or desktop
be connected with other users in order to establish applications by connecting to the internet. This ease of access
communication. One of the most widely used social media is has resulted in a growing number of users over time.
Twitter. This Twitter contains opinions or short messages called
tweets. The invited company also needs feedback from its An airline is an organization that provides flight services
customers to find out their view of the requested service. for passengers or goods. They rent or own aircraft to provide
Therefore sentiment analysis is needed to collect sentiment these services and can form partnerships or alliances with
classification of the company. This research uses a dataset from other airlines for mutual benefit. The company needs feedback
a collection of tweets about US Airlines. Because the dataset has from its customers to find out their views on the services of
been provided in Kaggle and already has several metadata so the airline company. The desired feedback will be difficult to
the experiment about feature selection can be done quickly. The obtain if done alone using a questionnaire, sampling, or
features selection in this research uses the Mutual Information interviewing samples from customers.
method. The method was chosen because it opposes the previous
reference which the method is effective in correlation Opinions or opinions from customers and even the general
measurement from one attribute to another. The results public are usually found in some social media comments. One
obtained show that the training data created with features of the social media that contains these opinions is on Twitter.
selected using complementary information has better. However, However, Twitter has lots of tweets which are sometimes in
this mutual information when compared with the selection of the form of news and some have become opinions. This
other features such as Chi Square and Annova F to choose the opinion selection uses sentiment analysis to sort out some of
old process and verify use for both methods. the tweets included in the opinion.
Keywords—sentiment analysis, twitter, feature selection, mutual This study aims to discuss aspects of the approach in
information. sentiment analysis of airline datasets. The dataset is analyzed
to generate sentiment classification. However, the
I. INTRODUCTION representation of text in process of classification generates
Social media is become a reference for several companies high dimension of featues. So, it needs to reduce the
that aim to find a description about customer behavior today dimension of featues. Reducing dimension can be done using
[1]. The results of the analysis can also be used as a guideline feature selection. This research uses Mutual Information (MI)
for taking policies related to future business direction for featues selection. The selection of this method is based on
decisions for the company concerned. This analysis results in former references that reveal effective in correlation
the classification of opinions or sentiments obtained from measurement from one attribute to another. This is expected
tweets provided in the form of datasets by Kaggle. Therefore, to increase the accuracy and reduce time processing in training
the collection of texts from this tweeter is very valuable data of sentiment analysis. Furthermore, the final stage of this
because it contains hidden information and must be revealed research is expected to produce sentiment analysis software
for the purpose of the company. Disclosure of such that can visualize trends in airline service users.
information involves mining data with various types of
II. SENTIMENT ANALYSIS
classifiers as well. This data mining process has main task in
converting unstructured text data into structured data. Text Sentiment analysis can also be called opinion mining. It
data that has been structured is a requirement for the data uses language computing and data mining to convert data text
mining process. In addition, sentiment analysis also involves into information. Sentiment analysis has objective to detect a
a Natural Language Preprocessing approach to get meaning person's mood, behavior and opinions from existing text
from the collection of words in the tweet. documents [5]. This can be positive, negative, or neutral.
However, the data processed initially has an unstructured
One of social media that is often used is Twitter. Twitter form. This is the main challenge in sentiment analysis to
users currently reach 330 million people in the world and this transform unstructured into structured data that can be further
social media can also generates 8000 data in every second processed.
[2][3]. On Twitter, there is the term tweet which means a
collection of texts or sentences that can contain news content, Sentiment involves sentiment holder, emotional
opinions, arguments, and several other types of sentences [4]. disposition such as positive or negative polarity and objects.
It has the potential to store useful information for several This structure is shown in Figure 1. Sentiment or opinion is an
companies. Twitter users are not only from the hands of the expression of the nature of someone who gives an opinion or
youth but various circles even from government and business comment either in social media or other systems [6]. When the
circles as well. The usage of Twitter can be through various opinion owner encounters a situation where the object is
involved, the opinion can be identified or traced through the method is the simplest way to make text representations
impact of interaction with the object or entity. These opinions to produce features..
have emotional tendencies which can be broken down into 2
types of positive and negative opinions. Sentiment analysis 2. CHI Statistics: CHI Square is one of the most commonly
has the task to produce 2 opinions from input of opinions that used word selection algorithms. This algorithm is done
do not have a previous label. based on statistical principles. The Chi Square method
attempts to measure the divergence of distributed data
based on the assumption that the appearance of text
features is very independent of sentiment class.
3. Information Gain: also one method that is often used in
text mining. The method of paying attention to large
amounts of text is uncertain. Thus, the Information Gain
method is important for text features that are measured in
accordance with a reduction in uncertainty if the value of
the text feature is unknown. This method uses entropy to
calculate text uncertainty.
4. Standard deviation: Standard deviation is one of the
methods in statistics that is used also in feature selection.
Fig. 1. Schematic Structure of Sentiment Analysis
This method attempt to find the value of means between
points or individuals that have been calculated. In this
III. FEATURE SELECTION case, a lower standard deviation is recommended, which
indicates that the resulting average value has a density for
Feature selection methods can be divided into several each point or individual.
methods which are lexicon-based methods that require
markers from humans and statistical methods that However, a low standard deviation indicates that the
automatically provide markers [7]. This usage statistical feature is close to other features.
method is often be done in the context of sentiment analysis. 5. Mutual Information: Mutual Information (MI) is a filter-
The lexicon-based approach usually starts with a number of based method that is often used. This method measures
collections of words. Then, it is obtained this collection of the importance of feature information by selecting
words through detection of synonyms or online resources to selected criteria with class labels. It is assumed that
get a larger lexicon. Feature selection technique tries to keep feature with strong correlation with unstable will improve
a document as a collection of words or called bag of words classification performance.
(BOW). It can also be thought of as a string that maintains the
ureness of words in a document. BOW is often used because IV. MUTUAL INFORMATION
of complexity in the process.
Mutual Information (MI) is a feature selection method.
The process of selecting this feature is shown in Figure 2. This method is an effective method for measuring correlations
Initially the text document that has been labeled is then between variables [10][11]. Suppose that x and y are discrete
converted into a vector that has many features based on a and pickled variables, then the variables x and y have many N
collection of words [8]. This is the process that was mentioned observations. Therefore the relationship of variables x and y
earlier, namely Bag of Word. The resulting features have a can be expressed using the following equation:
relatively large number. Therefore it is necessary to undo 𝑝(𝑥,𝑦)
features and leave features that have the highest value. This 𝐼(𝑥; 𝑦) = 𝐻(𝑦) − 𝐻(𝑦|𝑥) = ∑𝑥,𝑦 𝑝(𝑥, 𝑦) (1)
𝑝(𝑥)𝑝(𝑦)
feature with the highest value stores important information
than a feature with a lower value. The word document that has Where H (y) shows the entropy value of y which adds a
been downloaded can be continued to the data training stage. measure of uncertainty in discrete or discrete random
variables [10]. The function H (x|y) shows the entropy
condition of x given to y. Next, the function p (,) is a general
probability function. The MI method indicates how much x
and y information is loaded. The values x and y can be
independent or independent. So, they can be positive or equal
to zero. The minimum value of Redundancy Maximum
Relevance (mRMR) which directly uses MI for its redundancy
Fig. 2. Feature Selection Phases value and its relevance to involvement between variables and
one of the popular MI method development. The condition of
The main idea of various feature selections is the same. equation (1) can be illustrated in Figure 3. The ranking of the
Each algorithm calculates the value of each feature then criteria for the mRMR method can be expressed using the
important features are chosen according to the initial value or following equation:
the threshold value of the feature's score [8] [9] . In fact, there 1
are several feature selection methods that are often used, 𝐽𝑚𝑅𝑀𝑅 (𝐹𝑘 ) = max [𝐼 (𝑓𝑘 ; 𝐶) − ∑ 𝐼(𝑓𝑘 ; 𝑓1 )] (2)
𝑓1 ∈𝑆,𝑓𝑘 ∈𝐹−𝑆 |𝑆|
namely:
where there is Equation (1) which is a candidate and is
1. Document frequency (DF): In the Document Frequency represented by function I(,)[8]. Furthermore, all available
(DF) method represents the number of frequencies from features are represented by F while the selected features can
which words or documents appear in the term. This be symbolized by S; features found in S are represented by f1;
Authorized licensed use limited to: Somaiya University. Downloaded on December 28,2023 at 13:30:14 UTC from IEEE Xplore. Restrictions apply.
296
2019 4th International Conference on Information Technology, Information Systems and Electrical Engineering
(ICITISEE)
then the class or label in the data is symbolized by C. In experiments are then discussed and compiled papers to be
Equation (2) shows the redundancy between candidate published in accordance with predetermined outcomes.
features. Furthermore, some of the selected feature candidates
have paired variables that do not pay attention to joint Figure 5 shows the flow of the sentiment analysis process
relationships and conditional redundancies determined by the in getting sentiment towards tweets. In the picture there are
third or more variables. several important steps to be taken as follows:
Authorized licensed use limited to: Somaiya University. Downloaded on December 28,2023 at 13:30:14 UTC from IEEE Xplore. Restrictions apply.
297
2019 4th International Conference on Information Technology, Information Systems and Electrical Engineering
(ICITISEE)
Feature selection is needed to choose features that have dataset with dimensions of 14641 x 10029. So, the features
high value. All features that are formed do not need to be used formed from the Bag of Word process are 10029. Because the
all for the classification process because many features will example in Table II uses 10 total features, those features are
require a long processing time. In this study using feature 10 features with the largest value from MI calculations which
selection with mutual information methods. Then, the features originally numbered 10029. The algorithm for recording
calculated by the method will be retained, whereas features features that have been calculated with MI is to use
that have less value are not used. SelectKBest. This algorithm will leave k features that are
determined at the beginning with the MI value of each best
feature of all the existing features.
2 positive 0.16544
3 neutral 0.08375
4 negative 0.08375
5 negative 0.07264
6 negative 0.05764
7 positive 0.04629
8 positive 0.04622
9 positive 0.04404
Authorized licensed use limited to: Somaiya University. Downloaded on December 28,2023 at 13:30:14 UTC from IEEE Xplore. Restrictions apply.
298
2019 4th International Conference on Information Technology, Information Systems and Electrical Engineering
(ICITISEE)
Authorized licensed use limited to: Somaiya University. Downloaded on December 28,2023 at 13:30:14 UTC from IEEE Xplore. Restrictions apply.
299
2019 4th International Conference on Information Technology, Information Systems and Electrical Engineering
(ICITISEE)
[7] W. Medhat, A. Hassan, and H. Korashy, “Sentiment analysis [11] M. J. Tian, R. Y. Cui, and Z. H. Huang, “Automatic Extraction
algorithms and applications: A survey,” Ain Shams Eng. J., 2014. Method for Specific Domain Terms Based on Structural Features and
[8] Y. Liu, J. W. Bi, and Z. P. Fan, “Multi-class sentiment classification: Mutual Information,” Proc. - 2018 5th Int. Conf. Inf. Sci. Control Eng.
The experimental comparisons of feature selection and machine ICISCE 2018, pp. 147–150, 2019.
learning algorithms,” Expert Syst. Appl., vol. 80, pp. 323–339, 2017. [12] F. S. Nurfikri, M. S. Mubarok, and Adiwijaya, “News topic
[9] A. Yousefpour, R. Ibrahim, and H. N. A. Hamed, “Ordinal-based and classification using mutual information and Bayesian network,” 2018
frequency-based integration of feature selection methods for 6th Int. Conf. Inf. Commun. Technol. ICoICT 2018, vol. 0, no. c, pp.
sentiment analysis,” Expert Syst. Appl., 2017. 162–166, 2018.
Authorized licensed use limited to: Somaiya University. Downloaded on December 28,2023 at 13:30:14 UTC from IEEE Xplore. Restrictions apply.
300