You are on page 1of 11

DOSEN : BARLIN, S.T.,M.Eng.,Ph.D.

NAMA: Muhammad Yudana Aditya Pratama

NIM : 03051382227109

Review paper

PRELIMINARY

Twitter is one of the most popular social media in the world. Indonesia occupies the 5th
position of the largest users in the world, every day the twitter server receives large amounts
of tweet data, thus we can perform data mining that is used for certain purposes, one of which
is as a medium for promoting a product or service as well as research conducted by James
luke and this Suharjito.
For large-scale data, speed is needed in the data search process. So it is necessary to group
the data first. Naive Bayes is a learning algorithm for classification with computational
efficiency and good accuracy, especially for dimensions and large amounts of data. For this
reason, this study will prove the ability of the nave Bayes classifier to classify tweets
containing information about a product and service, research case studies This event was held
at PT Bobobobo Jakarta.
METHOD

Data mining is a process of knowledge discovery (discovery of knowledge) from very large
data. Meanwhile text mining is a field of data mining that aims to collect useful information
from text data in natural language or the process of analyzing text data and then extracting
useful information for a particular purpose.

The algorithm used in this study is the Naïve Bayes classifier (NBC). Nave Bayes classifier is
a machine learning method that
utilize the calculation of probability and statistics put forward by the French scientist Thomas
Bayes. That is predicting future probabilities based on past experience.
Process classification conducted basedon equality :

The research process carried out can be seen in the picture 1


Figure 1 Research Framework

Data:
The data used in this study are Indonesian-language tweets in the Jakarta area, from June
2013 to February 2014 from the data then divided into two categories/classes: products and
services.
 Product Category:

"Liattas, shoes, headscarves, clothes, I want to buy everything"

"Looking for a batik dress: "if you get what you like, it's a bit expensive in the pocket
or the dress...a pair with a shirt..."

Service Category :

“Yoga class after hours is quite flexible, cyn…”

“vacation to paris, traveling mahameru, comparative study of wear and tear”

"Looking for a cheap hotel near the center of Surabaya? any suggestions?"

Tweet classification with NBC

After the tweet data is collected, then the data is used as a training dataset and grouping based
on product and service class/attributes. The next process is classification using the Naïve
Bayes Classifier (NBC) algorithm to measure the level of accuracy. The tweet classification
process is drawn with a flowchart in the picture 2
Gambar 2 Flowchart klasifikasi menggunakan NBC

Word and Tweet Trend Selection

After tweets are classified into product and service classes. The next step is to select
popular/trend words and compare them with tweets that will be used as promotions
automatically. The following is the flow of the tweet selection and promotion process. The
selection and promotion process is shown in Figure
3
Figure 3 automatic selection and promotion flowchart

The steps for selecting popular/trend words and tweets that will be used as promotional
materials are as follows:

1.Tokenization

The stage of cutting the input string based on the words that compose it.
words are separated from the tweet as a sign. The word is considered valid if it consists of 3-
25 letters and is not a link or URL

2.Stopword

These are words that have no effect on the classification process. The results of this process
are stored in the database

3.Trend words
By using query from database to get back 5 words from product and service category

4.Promoting tweets

Promotional tweets are written by the Twitter admin, each tweet is specified with a
promotional grace period, keywords, and a match score with the trend word. Then the tweets
are stored in the database

5.Tweets automatically

The query process to the database to get a match / match promotion of tweets with trend
words. If each criterion matches, the system will automatically tweet.

EVALUATION

This research has two stages, as follows:


Analysis of the accuracy of the Naïve Bayes Classier algorithm
In measuring the accuracy of the nave Bayes classifier algorithm, tweets/data are formed into
three variations of training and test data. Every product and service tweet is tested.
distribution of training data and test data is shown in table 1
Table 1 composition and variation of training data

Analysis of increasing follower engagement

To measure the performance of follower engagement, it is calculated by the following


equation:
a. RESULTS AND DISCUSSION

Tweet classification results using NBC

he composition of variations from the training/training data is used in all product categories,
and results as shown in table 2 below

Table 2 NBC results using product tweet data

Using the variation composition of the training data for service category tweets, the results
are as shown in table 3

Table 3 NBC results using service tweet data

Composition of training data variation on the combination of product and service category tweets.
Table 4 NBC results (combination of product and service tweet data)

From the results of the test data above, the NBC algorithm has a fairly good level of
accuracy.

Tweet promotion automation results

From the experiments conducted in September 2014 to February 2015, the results

Picture 4 Engagement twitter @_bobobobo_


From the picture above, the average calculation has been done by combining the results of retweets,
mentions and the addition of followers before and after. Shown in table 5

Table 5 Increased follower engagement

The table above shows that the increase in retweets and mentions will greatly affect the
increase in follower engagement. The results of the comparison of follower engagement
before and after implementation are shown in Figure 5

Figure 5 levels of engagement before and after implementation This study gave
positive results on increasing follower engagement.
CONCLUSION

The conclusions of this study are:

1.The NBC algorithm has a high level of accuracy in the classification process, as indicated
by an accuracy rate of 90.31% using product category test data, and 80.91% using service
category test data. And the combination of the two results in an accuracy of 83.51%

2.Many of the stopwords can determine trendwords from a collection of product and service
category tweets

3.The increase in activity occurred on twitter by this study, for tweets it reached 39%,
mentions 120% and new followers 69%. retweeting and mentioning have an impact on
follower engagement results.

4.The number of tweet engagement rates, after this study gave a fairly high result of 17.44%
and the lowest was 4.72%. when compared to the previous study, the highest was 3.80% and
the lowest was 2.90%.

5.By using twitter as a promotional medium, it gives quite satisfactory results, before
tweeting we can analyze the trend word / trending topic from followers, which gives a good
response from followers.

You might also like