You are on page 1of 5

Topic: Analysis of Community Sentiment on Discourse Toll Road Entry for Motorbikes based on Opinion

from Twitter Using the Naïve Bayes Classifier Method and Support Vector Machine

Transportation is a major component in life and life systems, government systems, and social systems.
The social demographic conditions of a region have an influence on transportation performance in the
region. The level of population density will have a significant influence on transportation capacity to
serve the needs of the community. In urban areas, the trend is a high population increase due to birth
rates and urbanization (Susantoro and Parikesit, 2004).

To support transportation mobilization, transportation infrastructure is certainly needed. The


development of transportation infrastructure is one of the vital points in increasing a country's
economic growth. In Indonesia, the development of transportation infrastructure in the next 15 years is
estimated to cost 1,786 trillion including investment for roads of 339 trillion, ports of 117 trillion,
airports of 32 trillion, and railways of 326 trillion (Susantono and Berawi, 2012). One part of the
infrastructure that is also taken into account from the investment costs for roads is the construction of
highways or toll roads. Based on the PUPR PERMEN NO 10 / PRT / M / 2018, toll roads are public roads
that are part of the road network system and some of the national roads that are used are required to
pay tolls. As of February 2019, there are 48 toll roads already operating in Indonesia with a total length
of more than 1,000km (BPJT, 2019). However, based on regulations in Indonesia written in Government
Regulation (PP) No. 44 of 2009 concerning Amendments to PP No. 15 of 2005 concerning Toll Roads, in
this regulation it is stated that highways are only for four-wheeled vehicles or more. From the
regulation, it can be concluded that motorized bicycles cannot enter the toll road.

If we look at vehicle data, the number of vehicles in 2018 touches a very significant number. Reporting
from the Indonesian National Police data, the number of vehicles registered in Indonesia as of January 1,
2018 reached 111 million vehicles, or more precisely 111,571,239 vehicles. Of this amount, more than
80% or more precisely 82% of the number of vehicles is dominated by motorbikes which contributed
91,085,532 motorbikes. Furthermore, followed by private cars that contributed 12% or as many as
13,253,143 units of cars, and the remainder was contributed by bus cars, goods cars, and special
vehicles. In fact, based on regulations made by the Ministry of Public Works and Public Housing through
the Toll Road Regulatory Agency, toll roads have the following objectives and benefits.

The Purpose of Implementing Toll Roads

1. Streamlining traffic in areas that have developed.

2. Improve service distribution of goods and services to support economic growth.

3. Increasing equity in development outcomes and justice.

4. Relieve the burden of Government funds through participation of road users.

Because the prohibition of motorbikes to enter the toll road is deemed incompatible with the purpose
of toll road operation number 3, namely increasing equity in development outcomes and justice,
Chairman of the Indonesian Parliament Bambang Soesatyo proposed that the government be given a
special lane on toll roads (Kompas.com, 2019). The procurement of special lanes for motorbikes on toll
roads can get around the prohibition of motorbikes entering the toll road. The procurement of special
lanes for motorbikes on toll roads is also ratified by Government Regulation (PP) No. 44 of 2009, which
revises Article 38 of PP No. 15 of 2005 concerning Toll Road, in article 1a it is stated that toll roads can
be equipped with special lanes for wheeled motorized vehicles two. With a note, the path must be
physically separated from the path of four or more wheeled vehicles.

The concept of toll roads with special motorcycle lanes has actually been implemented in several toll
roads in Indonesia, namely the Surabaya-Madura toll road section and the Bali Mandara toll road
section. The Toll Road Regulatory Agency (BPJT) is currently reviewing proposals on motorbikes on the
Jakarta toll road section (Detikfinance, 2019). Based on the vehicle population data mentioned above,
62.81% of vehicles are located on Java. The Java Metro Regional Police noted that until 2018 there were
13,765,308 registered motorbikes, making DKI Jakarta province ranked second in the number of
registered motorbikes (Polri, 2018). Even though DKI Jakarta ranks below the East Java province, it
should be noted that the area of DKI Jakarta is only an area of 664 km2, much smaller than the area of
East Java which is 47,799 km2 (Ministry of Home Affairs, 2016). That is, the density of motorbikes in DKI
Jakarta province is much higher than the density of motorbikes in the East Java province. This has led to
the creation of a special motorcycle lane discourse on the Jakarta toll road section

However, motorbike discourse into toll roads raises pros and cons. Head of BPJT Danang Parikesit said
that his party is currently conducting a review regarding the proposed motorbike entrance toll. One of
the things that is considered is an accident (Detikfinance, 2009). Various layers of society also provide
aspirations in the form of support and rejection of the discourse of obtaining toll entry for motorbikes.
In the past, people expressed their opinions, criticisms and suggestions through print media that not
everyone has the ability to write and the opportunity to publish their writings. However, along with the
development of the times and the improvement of communication technology nowadays, it has
changed the tendency of people to express their opinions on social networks. One of the popular social
networks among internet users today is Twitter.

Twitter is a social networking and microblog service that allows users to send and read text-based
messages of up to 140 characters, known as tweets. The development of Twitter usage is currently very
rapid. Reporting from Techno.id site, Twitter has 302 million active users, of which 80 percent come
from mobile devices. Of that number, 37 percent of Twitter users are 18-29 years old, while 25 percent
are in 30-49 years old. With that many active users, Twitter receives 500 million tweets every day.
Twitter users from Indonesia until April 2018 are the third largest in the world with 140 million users
(The Next Web, 2018). Simplicity and ease of use are some reasons why Twitter is very popular with
Indonesian people in communicating.

Sentiment analysis is a computational-based detection and learning of opinions or sentiments,


emotions, and subjectivity in the text. As a special text mining application, sentiment analysis is related
to automatic extraction of positive or negative opinions from the text (He, Wu, Yan, Akula, & Shen,
2015). The tweet data is classified into positive and negative opinions using the text classification
method. There are various document classification techniques, including Naïve Bayes classifier, Decision
Trees, and Support Vector Machines. One of the most popular methods used in classifying documents
today is the Naïve Bayes classifier method (Natalius, 2011). The Naïve Bayes classifier method has high
speed and accuracy when applied in large databases and diverse data (Larose, 2006). A similar thing was
expressed by Rita McCue in her research, namely the Naïve Bayes Classifier method has several
advantages, among others, simple, fast and high accuracy (McCue, 2009).
Research on sentiment analysis has been done before. Among them are those who classify sentiments
towards film reviews using various machine learning techniques. Machine learning techniques used are
Naïve Bayes, Maximum Entropy, and Support Vector Machines (SVM). Research on sentiment analysis
using datasets from Twitter social network was conducted by Parikh, R., & Movassate, M. They
conducted sentiment analysis on Twitter social networking media using several classification techniques.
The next study that became the author's reference in compiling this research was research conducted
by Rozi, Imam F; Pramono Sholeh H; Dahlan, Achmad E in 2012. In this study sentiment analysis was
conducted using the Naïve Bayes method in determining sentiment polarity. The results of the study
showed a fairly high accuracy for the Naïve Bayes Classifier method. On the basis of this, the author
intends to apply the Naïve Nayes Classifier method to see public sentiment in the Twitter media on the
discourse of obtaining toll road entry for motorbikes.

1.2 Problem Formulation

Tweet data classification research using Naïve Bayes Classifier method and SVM method to classify
negative and positive opinions has been done. However, these studies are more focused on discussing
sentiments within certain companies or political parties. Though sentiment analysis can also be done on
other topics. The method used is a traditional method. So, in this study formulated a problem, how is
the analysis of sentiment towards the public on the discourse of toll road entry for motorbikes based on
opinions from Twitter using the Naïve Bayes Classifier method and Support Vector Machine? How do
you find out what specific topic topics are often mentioned by the community regarding the discourse?

1.3 Research Objectives

Based on the formulation of the problem above, the objectives to be achieved from this study are.

1. Obtain the results of the classification of public sentiment towards the discourse of toll road entry for
motorbikes based on opinions from Twitter using the Naïve Bayes Classifier method.

2. Obtain the results of the classification of public sentiments towards the discourse of toll road entry
for motorbikes based on opinions from Twitter using the Support Vector Machine method.

3. Comparing the results of the Naïve Bayes Classifier method classification with the Support Vector
Machine method.

4. Get specific topics discussed from community tweets by forming clusters based on each negative and
positive sentiment.

1.4 Benefits of Research

The benefits to be achieved from the results of this study are:

1. Provide information about sentiment analysis of data tweets and how they are classified.

2. Provide a cluster descriptive description of the specific topic of the community tweet based on the
negative and positive classifications that have been formed.

3. As material for consideration for relevant agencies in making decisions and formulating appropriate
policies / programs in order to improve equity in development outcomes and justice.

1.5 Limitation of Problems


In this study, the problem was limited to.

1. Data in Indonesian.

2. Twitter social media tweets collected from 29 January 2019 to 30 March 2019.

3. Sentiments that are classified are those that have positive and negative values manually.

4. Data taken is searched by keywords: @detikcom and @detikoto.

3.1 Data and Data Sources

The research conducted was an analysis of sentiment towards Twitter data regarding the news on the
discourse of acquiring motorbikes on toll roads. Data is sourced from residents and / or related to news
on motorbike acquisition discourse on toll roads, the data is retrieved with search words to Twitter
accounts related to the dissemination of the news. These accounts are @detikcom and @detikoto.

Data retrieval is done by special processes using the Twitter Application Programming Interface (API).
Data taken in Indonesian. On the Twitter API, users can access tweet data to process and analyze
information from Twitter. The process in question is to create a Consumer Key, Consumer Secret, Access
Token, and Access Token Secret that will function as a key so that the application that will be made can
be known by Twitter. After getting the four keys above, then take the tweet data and save it to Excel
format Comma Separated Value (CSV). The tweet data is collected from January 29, 2019 to March 30,
2019.

3.3 Steps of Analysis

The steps of the analysis that will be carried out in this study are presented as follows.

1. Take the tweet data via the Twitter API.

a. Open the https://appstwitter.com page and then sign in to your personal Twitter account.

b. Create an API application with a clear name and purpose. This is to give an explanation on Twitter
about what data retrieval was done.

c. Make arrangements and get the Consumer Key, Consumer Secret, Access Token, and Access Token
Secret codes that are used as key data collection requirements.

d. Enter the keyword containing detikcom, and detikoto.

e. Save the crawling results in the dataset with Excel.CSV format.

2. Preprocessing as preprocessing is needed to avoid data that is not yet ready for processing, such as:
imperfect data, noise in the data, and inconsistent data (Hemalatha, 2012). The preprocessing stages in
this text data are:

a. Deleting tweet data that does not contain sentiments, and also which contains two sentiments at
once.

b. Delete the words "RT" which are symbols of retweet or response tweets.

c. Remove URL link.


d. Remove symbols and punctuation marks, such as: `~! @ # $% ^ & * () _ + - = {} [] \ |:;" ’<> ,.

e. Do a folding case in the form of homogenizing each text into non-capital letters.

f. Deleting words in the tweet text contained in stopwords.

g. Perform tokenizing, which is solving tweet sentences into words per word.

3. Making document term matrix and providing TF-IDF weighting.

4. Classification of data using Naïve Bayes Classifier for each account.

a. Calculates the probability of Vj in training data with equation (2.3), where Vj is a sentiment category,
namely V1 = negative, and V2 = positive.

b. Calculates the probability of the word ai in category Vj with equation (2.4).

c. The NBC probability model is stored and used for the data testing stage.

d. Calculates the highest probability of the tested sentiment category (VMAP) with equation (2.1).

e. Look for the maximum VMAP value and enter the tweet in the category with the maximum VMAP.

5. The classification of sentiment data uses the Support Vector Machine (SVM) method for each
account.

a. Determine the weighting parameter in SVM and build the SVM model.

b. Specifies the weighting of parameters in SVM for each type of kernel.

c. Build an SVM model using linear and Radial Basis Function.

6. Forming clusters based on words or terms that appear on negative and positive sentiment tweets to
find out specific topic topics discussed from news tweets.

a. Separating negative sentiment data and positive sentiment data from the initial data.

b. Applying the K-means cluster formation algorithm to group words or terms.

c. Evaluate the number of clusters as optimizing the number of clusters.

d. Check the most words in each cluster that form a specific term phrase that is often discussed.

7. Interpretation and draw conclusions

You might also like