Professional Documents
Culture Documents
from Twitter Using the Naïve Bayes Classifier Method and Support Vector Machine
Transportation is a major component in life and life systems, government systems, and social systems.
The social demographic conditions of a region have an influence on transportation performance in the
region. The level of population density will have a significant influence on transportation capacity to
serve the needs of the community. In urban areas, the trend is a high population increase due to birth
rates and urbanization (Susantoro and Parikesit, 2004).
If we look at vehicle data, the number of vehicles in 2018 touches a very significant number. Reporting
from the Indonesian National Police data, the number of vehicles registered in Indonesia as of January 1,
2018 reached 111 million vehicles, or more precisely 111,571,239 vehicles. Of this amount, more than
80% or more precisely 82% of the number of vehicles is dominated by motorbikes which contributed
91,085,532 motorbikes. Furthermore, followed by private cars that contributed 12% or as many as
13,253,143 units of cars, and the remainder was contributed by bus cars, goods cars, and special
vehicles. In fact, based on regulations made by the Ministry of Public Works and Public Housing through
the Toll Road Regulatory Agency, toll roads have the following objectives and benefits.
Because the prohibition of motorbikes to enter the toll road is deemed incompatible with the purpose
of toll road operation number 3, namely increasing equity in development outcomes and justice,
Chairman of the Indonesian Parliament Bambang Soesatyo proposed that the government be given a
special lane on toll roads (Kompas.com, 2019). The procurement of special lanes for motorbikes on toll
roads can get around the prohibition of motorbikes entering the toll road. The procurement of special
lanes for motorbikes on toll roads is also ratified by Government Regulation (PP) No. 44 of 2009, which
revises Article 38 of PP No. 15 of 2005 concerning Toll Road, in article 1a it is stated that toll roads can
be equipped with special lanes for wheeled motorized vehicles two. With a note, the path must be
physically separated from the path of four or more wheeled vehicles.
The concept of toll roads with special motorcycle lanes has actually been implemented in several toll
roads in Indonesia, namely the Surabaya-Madura toll road section and the Bali Mandara toll road
section. The Toll Road Regulatory Agency (BPJT) is currently reviewing proposals on motorbikes on the
Jakarta toll road section (Detikfinance, 2019). Based on the vehicle population data mentioned above,
62.81% of vehicles are located on Java. The Java Metro Regional Police noted that until 2018 there were
13,765,308 registered motorbikes, making DKI Jakarta province ranked second in the number of
registered motorbikes (Polri, 2018). Even though DKI Jakarta ranks below the East Java province, it
should be noted that the area of DKI Jakarta is only an area of 664 km2, much smaller than the area of
East Java which is 47,799 km2 (Ministry of Home Affairs, 2016). That is, the density of motorbikes in DKI
Jakarta province is much higher than the density of motorbikes in the East Java province. This has led to
the creation of a special motorcycle lane discourse on the Jakarta toll road section
However, motorbike discourse into toll roads raises pros and cons. Head of BPJT Danang Parikesit said
that his party is currently conducting a review regarding the proposed motorbike entrance toll. One of
the things that is considered is an accident (Detikfinance, 2009). Various layers of society also provide
aspirations in the form of support and rejection of the discourse of obtaining toll entry for motorbikes.
In the past, people expressed their opinions, criticisms and suggestions through print media that not
everyone has the ability to write and the opportunity to publish their writings. However, along with the
development of the times and the improvement of communication technology nowadays, it has
changed the tendency of people to express their opinions on social networks. One of the popular social
networks among internet users today is Twitter.
Twitter is a social networking and microblog service that allows users to send and read text-based
messages of up to 140 characters, known as tweets. The development of Twitter usage is currently very
rapid. Reporting from Techno.id site, Twitter has 302 million active users, of which 80 percent come
from mobile devices. Of that number, 37 percent of Twitter users are 18-29 years old, while 25 percent
are in 30-49 years old. With that many active users, Twitter receives 500 million tweets every day.
Twitter users from Indonesia until April 2018 are the third largest in the world with 140 million users
(The Next Web, 2018). Simplicity and ease of use are some reasons why Twitter is very popular with
Indonesian people in communicating.
Tweet data classification research using Naïve Bayes Classifier method and SVM method to classify
negative and positive opinions has been done. However, these studies are more focused on discussing
sentiments within certain companies or political parties. Though sentiment analysis can also be done on
other topics. The method used is a traditional method. So, in this study formulated a problem, how is
the analysis of sentiment towards the public on the discourse of toll road entry for motorbikes based on
opinions from Twitter using the Naïve Bayes Classifier method and Support Vector Machine? How do
you find out what specific topic topics are often mentioned by the community regarding the discourse?
Based on the formulation of the problem above, the objectives to be achieved from this study are.
1. Obtain the results of the classification of public sentiment towards the discourse of toll road entry for
motorbikes based on opinions from Twitter using the Naïve Bayes Classifier method.
2. Obtain the results of the classification of public sentiments towards the discourse of toll road entry
for motorbikes based on opinions from Twitter using the Support Vector Machine method.
3. Comparing the results of the Naïve Bayes Classifier method classification with the Support Vector
Machine method.
4. Get specific topics discussed from community tweets by forming clusters based on each negative and
positive sentiment.
1. Provide information about sentiment analysis of data tweets and how they are classified.
2. Provide a cluster descriptive description of the specific topic of the community tweet based on the
negative and positive classifications that have been formed.
3. As material for consideration for relevant agencies in making decisions and formulating appropriate
policies / programs in order to improve equity in development outcomes and justice.
1. Data in Indonesian.
2. Twitter social media tweets collected from 29 January 2019 to 30 March 2019.
3. Sentiments that are classified are those that have positive and negative values manually.
The research conducted was an analysis of sentiment towards Twitter data regarding the news on the
discourse of acquiring motorbikes on toll roads. Data is sourced from residents and / or related to news
on motorbike acquisition discourse on toll roads, the data is retrieved with search words to Twitter
accounts related to the dissemination of the news. These accounts are @detikcom and @detikoto.
Data retrieval is done by special processes using the Twitter Application Programming Interface (API).
Data taken in Indonesian. On the Twitter API, users can access tweet data to process and analyze
information from Twitter. The process in question is to create a Consumer Key, Consumer Secret, Access
Token, and Access Token Secret that will function as a key so that the application that will be made can
be known by Twitter. After getting the four keys above, then take the tweet data and save it to Excel
format Comma Separated Value (CSV). The tweet data is collected from January 29, 2019 to March 30,
2019.
The steps of the analysis that will be carried out in this study are presented as follows.
a. Open the https://appstwitter.com page and then sign in to your personal Twitter account.
b. Create an API application with a clear name and purpose. This is to give an explanation on Twitter
about what data retrieval was done.
c. Make arrangements and get the Consumer Key, Consumer Secret, Access Token, and Access Token
Secret codes that are used as key data collection requirements.
2. Preprocessing as preprocessing is needed to avoid data that is not yet ready for processing, such as:
imperfect data, noise in the data, and inconsistent data (Hemalatha, 2012). The preprocessing stages in
this text data are:
a. Deleting tweet data that does not contain sentiments, and also which contains two sentiments at
once.
b. Delete the words "RT" which are symbols of retweet or response tweets.
e. Do a folding case in the form of homogenizing each text into non-capital letters.
g. Perform tokenizing, which is solving tweet sentences into words per word.
a. Calculates the probability of Vj in training data with equation (2.3), where Vj is a sentiment category,
namely V1 = negative, and V2 = positive.
c. The NBC probability model is stored and used for the data testing stage.
d. Calculates the highest probability of the tested sentiment category (VMAP) with equation (2.1).
e. Look for the maximum VMAP value and enter the tweet in the category with the maximum VMAP.
5. The classification of sentiment data uses the Support Vector Machine (SVM) method for each
account.
a. Determine the weighting parameter in SVM and build the SVM model.
6. Forming clusters based on words or terms that appear on negative and positive sentiment tweets to
find out specific topic topics discussed from news tweets.
a. Separating negative sentiment data and positive sentiment data from the initial data.
d. Check the most words in each cluster that form a specific term phrase that is often discussed.