Professional Documents
Culture Documents
Review Paper
Review Paper
NIM : 03051382227109
Review paper
PRELIMINARY
Twitter is one of the most popular social media in the world. Indonesia occupies the 5th
position of the largest users in the world, every day the twitter server receives large amounts
of tweet data, thus we can perform data mining that is used for certain purposes, one of which
is as a medium for promoting a product or service as well as research conducted by James
luke and this Suharjito.
For large-scale data, speed is needed in the data search process. So it is necessary to group
the data first. Naive Bayes is a learning algorithm for classification with computational
efficiency and good accuracy, especially for dimensions and large amounts of data. For this
reason, this study will prove the ability of the nave Bayes classifier to classify tweets
containing information about a product and service, research case studies This event was held
at PT Bobobobo Jakarta.
METHOD
Data mining is a process of knowledge discovery (discovery of knowledge) from very large
data. Meanwhile text mining is a field of data mining that aims to collect useful information
from text data in natural language or the process of analyzing text data and then extracting
useful information for a particular purpose.
The algorithm used in this study is the Naïve Bayes classifier (NBC). Nave Bayes classifier is
a machine learning method that
utilize the calculation of probability and statistics put forward by the French scientist Thomas
Bayes. That is predicting future probabilities based on past experience.
Process classification conducted basedon equality :
Data:
The data used in this study are Indonesian-language tweets in the Jakarta area, from June
2013 to February 2014 from the data then divided into two categories/classes: products and
services.
Product Category:
"Looking for a batik dress: "if you get what you like, it's a bit expensive in the pocket
or the dress...a pair with a shirt..."
Service Category :
"Looking for a cheap hotel near the center of Surabaya? any suggestions?"
After the tweet data is collected, then the data is used as a training dataset and grouping based
on product and service class/attributes. The next process is classification using the Naïve
Bayes Classifier (NBC) algorithm to measure the level of accuracy. The tweet classification
process is drawn with a flowchart in the picture 2
Gambar 2 Flowchart klasifikasi menggunakan NBC
After tweets are classified into product and service classes. The next step is to select
popular/trend words and compare them with tweets that will be used as promotions
automatically. The following is the flow of the tweet selection and promotion process. The
selection and promotion process is shown in Figure
3
Figure 3 automatic selection and promotion flowchart
The steps for selecting popular/trend words and tweets that will be used as promotional
materials are as follows:
1.Tokenization
The stage of cutting the input string based on the words that compose it.
words are separated from the tweet as a sign. The word is considered valid if it consists of 3-
25 letters and is not a link or URL
2.Stopword
These are words that have no effect on the classification process. The results of this process
are stored in the database
3.Trend words
By using query from database to get back 5 words from product and service category
4.Promoting tweets
Promotional tweets are written by the Twitter admin, each tweet is specified with a
promotional grace period, keywords, and a match score with the trend word. Then the tweets
are stored in the database
5.Tweets automatically
The query process to the database to get a match / match promotion of tweets with trend
words. If each criterion matches, the system will automatically tweet.
EVALUATION
he composition of variations from the training/training data is used in all product categories,
and results as shown in table 2 below
Using the variation composition of the training data for service category tweets, the results
are as shown in table 3
Composition of training data variation on the combination of product and service category tweets.
Table 4 NBC results (combination of product and service tweet data)
From the results of the test data above, the NBC algorithm has a fairly good level of
accuracy.
From the experiments conducted in September 2014 to February 2015, the results
The table above shows that the increase in retweets and mentions will greatly affect the
increase in follower engagement. The results of the comparison of follower engagement
before and after implementation are shown in Figure 5
Figure 5 levels of engagement before and after implementation This study gave
positive results on increasing follower engagement.
CONCLUSION
1.The NBC algorithm has a high level of accuracy in the classification process, as indicated
by an accuracy rate of 90.31% using product category test data, and 80.91% using service
category test data. And the combination of the two results in an accuracy of 83.51%
2.Many of the stopwords can determine trendwords from a collection of product and service
category tweets
3.The increase in activity occurred on twitter by this study, for tweets it reached 39%,
mentions 120% and new followers 69%. retweeting and mentioning have an impact on
follower engagement results.
4.The number of tweet engagement rates, after this study gave a fairly high result of 17.44%
and the lowest was 4.72%. when compared to the previous study, the highest was 3.80% and
the lowest was 2.90%.
5.By using twitter as a promotional medium, it gives quite satisfactory results, before
tweeting we can analyze the trend word / trending topic from followers, which gives a good
response from followers.