1 s2.0 S1877050923017064 Main

Available online at www.sciencedirect.
com
ScienceDirect
Available online at www.sciencedirect.com
Procedia Computer Science 00 (2023) 000–000
www.elsevier.com/locate/procedia
ScienceDirect
Procedia Computer Science 227 (2023) 398–405
8th International Conference on Computer Science and Computational Intelligence

(ICCSCI 2023)
Sentiment analysis of the Indonesian community toward face-to-

face learning during the Covid-19 pandemic
Andrew Giovanni Gozala, Hady Pranotoa*, Muhammad Fikri Hasania
a
Computer Science Department, School of Computer Science, Bina Nusantara University, Jakarta, Indonesia 11480
Abstract
This research aims to analyze how the opinions of the Indonesian people in the learning system in the middle of the covid-19
pandemic. The research method is carried out by performing the K-NN method to determine the accuracy level of the data used.
The research data method is taken through public comments on Twitter social media using scrapping techniques with appropriate
keywords. The data will then be processed through training and testing and classified using the K-NN method. After the data is
classified, the accuracy, F1-Score, Recall, and Precision level will be tested using Confusion Matrix. The result showed that KNN
performed well, with above 70% of the F1-score for each class. According to the confusion matrix, accuracy also showed promising
results with 82%. Future research may include oversampling the class with fewer numbers. K-Fold cross-validation can also be
used to see the general performance of the model. The same method may be used to find sentiment towards a political policy that
is taken; whether a policy gets a good or bad response, if the response is bad, you can see the causal factors that cause the negative
sentiment. In this way, you can find out the will of the public.
© 2023 The Authors. Published by Elsevier B.V.

© 2023 The Authors. Published by ELSEVIER B.V.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
This is an open
Peer-review access
under article under
responsibility thescientific
of the CC BY-NC-ND license
committee (https://creativecommons.org/licenses/by-nc-nd/4.0)
of the 8th International Conference on Computer Science and
Peer-review
Computational under responsibility
Intelligence 2023 of the scientific committee of the 8th International Conference on Computer Science
and Computational Intelligence 2023
Keywords: sentiment analysis; face to face learning; k-nn;
* Corresponding author. Tel.: +62-021-534-5830; fax: +62-021-535=0660.

E-mail address: hadypranoto@binus.ac.id
1877-0509 © 2023 The Authors. Published by ELSEVIER B.V.

This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0) Peer-review
under responsibility of the scientific committee of the 8th International Conference on Computer Science and Computational
Intelligence 2023
1877-0509 © 2023 The Authors. Published by Elsevier B.V.

This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the 8th International Conference on Computer Science and
Computational Intelligence 2023
10.1016/j.procs.2023.10.539
Andrew Giovanni Gozal et al. / Procedia Computer Science 227 (2023) 398–405 399
422 Gozal et al./ Procedia Computer Science 00 (2023) 000–000
1. Introduction
The COVID-19 pandemic, declared a global pandemic by the World Health Organization (WHO) on March 11,
2020, has resulted in many countries instructing their citizens to stay at home, avoid physical contact, and practice
social distancing. Governments also issued similar instructions to all educational institutions to conduct classes online.
As a result, virtual learning became the only option for students and teachers to communicate[1]. However, since
implementing the new normal, face-to-face learning in educational institutions has begun to be applied in green zones
with the determination of priorities based on higher education levels first and considering the ability of students to use
health protocols. This has become a trending topic on Twitter as many parents are concerned about this policy due to
their fear of the spread of new COVID-19 clusters in Indonesia that are increasingly developing. On the other hand,
many also believe that this is a good policy as online learning is considered ineffective due to many students struggling
to receive material delivered by teachers online and a lack of adequate devices for some students. Face-to-face learning
is a conventional learning model that seeks to provide knowledge to students by bringing together teachers and students
in one room for planned, place-based, and social interaction-oriented learning [2].
Sentiment analysis can get information on public perception about a topic using machine learning approaches[3],
one of which is K-Nearest Neighbour or KNN. KNN is a machine learning approach in which several data point was
classified based on their K (a hyperparameter with integer value), the number of nearest neighbors. Several previous
studies have been conducted for sentiment analysis using machine learning. First is research about sentiment analysis
for e-payment in Indonesia [4]. This research was conducted using KNN and naïve Bayes, and the study showed that
KNN is superior to naïve Bayes with accuracy above 80%. The second study is about law enforcement sentiment
analysis [5]. This research improved the KNN with Particle Swarm Optimization (PSO) and performed better than
SVM. The following study is about sentiment analysis for e-commerce [6]. This study compares SVM and KNN and
shows that KNN performs better with accuracy above 80%, compared to SVM, which is around 75%.
Based on the issues that have arisen, an artificial intelligence solution is needed to perform sentiment analysis for
online learning during covid 19 pandemic. This research will propose a speed and light contextual, and timely
sentiment analysis from the Twitter post using KNN and TF-iDF vector for the Indonesian language. Other research
may use corpus crawling from the internet, but this research uses Twitter posts. The result can be used as material for
consideration by policymakers in government or private organizations. Sentiment research about people's views on
face-to-face learning methods needs to be done because face-to-face meeting policies become face-to-face debates.
There are people who are afraid of the spread of Covid-19; on the other side, there are those who think online learning
makes it difficult for students to understand lessons.
2. Literature Review
2.1. Sentiment Analysis
Analysing text data to identify the sentiment or opinion conveyed is called sentiment analysis [3]. It entails
categorizing content into positive, negative, or neutral at several levels, including document, Sentence, or aspect[7].
Information can be gleaned from consumer reviews, posts on social media, and other sources of opinion-based data
using sentiment analysis [8]. Natural language processing (NLP) and machine learning are techniques used in
sentiment analysis, and they may also leverage knowledge-based representations, semantic relations, and reasoning
to improve the precision and expressiveness of sentiment analysis systems [8]. Sentiment analysis has applications in
various industries, including marketing, healthcare, and politics, and can be used to gauge public sentiment and find
potential[9].
2.2. K-Nearest Neighbor
K-Nearest Neighbour or KNN is a machine learning algorithm used to solve classification or regression problems
[10]. This algorithm will predict the data point based on the k-closest data point by averaging or choosing the majority
class. The K closes data point is the data point in the training data set. Each data point in the training data will be
compared, and the K-closes data point will be used.
400 Andrew Giovanni Gozal et al. / Procedia Computer Science 227 (2023) 398–405
Gozal et al./ Procedia Computer Science 00 (2023) 000–000 423
3. Methodology
The research method conducted in this research contains several step which can see in Figure 1.
Fig. 1. Research Methodology Flowchart
3.1. Data Collection

Data collection is a crucial step that must be carried out with utmost care, as it can affect the results. Data related
to Indonesian community sentiment towards face-to-face learning during the COVID-19 pandemic through Twitter
using scraping techniques. The extraction of tweet data was done using keywords such as
"#PembelajaranTatapMuka,", "BelajarTatapMuka,", "#belajartatapmuka,", and the tweets had to be in the Indonesian
language.
3.1.1. Splitting Data
Corpus collected after crawling from Twitter will be split into training and testing data. The ratio of splitting data
is 80:20; 80% of data will be to training, and 20% of data will be divided into testing.
3.1.2. Data Preprocessing
Data preprocessing is the process of cleaning imperfect data and unclear data for further analysis. This process is
carried out on both the training and testing data. The steps involved in data preprocessing are as follows:
• Cleansing: Cleansing is performed to clean the data by removing punctuation marks or characters other than
text.
• Case folding: Case folding is carried out to standardize or update the capitalization of words to lowercase, and
clean words in sentences that contain elements such as URLs (http://), mentions (@), and delimiters such as
semicolons (,) and other punctuation marks.
• Tokenizing: Tokenizing is the process of dividing an input string into individual words.
• Stopword removal: remove words that are not important or meaningful. The standard stopword used is a
dynamic stopword, which is generated through a specific process that usually involves the corpus to be used.
• Stemming: converts words in a sentence affixed to their base form. The removed affixes include prefixes,
infixes, suffixes, and combinations of prefixes and suffixes. The standard stemming used is the Ahmad Yusoff
Sembok algorithm, which does not have specific rules or standards for the order of affix removal in each word.
The Ahmad Yusoff Sembok algorithm has four variations for affix removal: prefixes, infixes, confixes
(combination), and suffixes.
3.1.3. Data Labelling
The data collected was still raw. Therefore, the class for each data has not been mapped. Accordingly, each data
will be labeled into 3 classes, which are 1 (positif or pro), 0 (neutral), and -1 (contra). The value of class -1 or contra
after labeling is 506 corpus, 0 or neutral after labeling is 613 corpus, and the importance of 1 or pro after labeling is
382 corpus.
3.1.4. Text Vectorization
Text vectorization in this research was done by implementing the TF-iDF approach. Each token will be converted
to its corresponding TF-iDF vector. The vector and lookup table will be trained on the train set only. The TF-iDF
process was done by using Scikit Learn Python library.
3.1.5. Model Training
In this step, the machine learning model will be trained by using the TF-iDF text vector that was built previously.
Because the sentiment class was only 3, the K value = 3. Cosine similarity was used as distance measurements for
each data point. The formula for cosine similarity can be seen in Equation 1.
|𝐴𝐴||𝐵𝐵|
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝛼𝛼 = (1)
|𝐴𝐴|. |𝐵𝐵|
3.1.6. Model evaluation
In this step, the trained machine learning model will be evaluated by using accuracy, precision, Recall, and F1
Score metric (Equations 2, 3, 4, and 5). Each metric can be calculated by first counting the true positive (TP), true
negative (TN), false positive (FP), and false negative (FN).
𝑇𝑇𝑇𝑇
𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 = (2)
𝑇𝑇𝑇𝑇 + 𝐹𝐹𝐹𝐹
𝑇𝑇𝑇𝑇
𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 = (3)
𝑇𝑇𝑇𝑇 + 𝐹𝐹𝐹𝐹
𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 ∗ 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅
𝐹𝐹1 = 2 ∗ (4)
𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 + 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅
𝑇𝑇𝑇𝑇 + 𝑇𝑇𝑇𝑇
𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 = (5)
𝑃𝑃 + 𝑁𝑁
4. Result and Discussion
This section will discuss the results of each step mentioned in the methodology section—first, the data collection.
As mentioned, the data was collected by scraping from Twitter. The scraped tweets were tweets that contained
keywords such as the keyword "#PembelajaranTatapMuka", "BelajarTatapMuka", "#belajartatapmuka", "#covid19".
Total number of data scraped is 1500 corpus. The scraped data sample can be seen in Table 1:
Table 1. Dataset Scraped from Twitter
Tweet
Kanit Binmas Iptu Moh.Widayadi Sanani dan Panit Binmas Aiptu Edy Rusmadi berkoordinasi dengan Kepala Sekolah SMAN 1 Piyungan terkait
dengan Pembelajaran Tatap Muka (PTM) yang akan dilaksanakan pada Senin tanggal 4 Oktober 2021.
Rabu, 29-9-2021. https://t.co/bwH9SLmxPN
Unicef mengapresiasi langkah dan upaya yang diambil oleh Gubernur Ganjar Pranowo dalam pelaksanaan Pembelajaran Tatap Muka (PTM) di
Jawa Tengah https://t.co/zQxRc6mCqu
Kanit Binmas Iptu Moh.Widayadi Sanani dan Panit Binmas Aiptu Edy Rusmadi berkoordinasi dengan Kepala Sekolah SMAN 1 Piyungan terkait
dengan Pembelajaran Tatap Muka (PTM) yang akan dilaksanakan pada Senin tanggal 4 Oktober 2021.
Rabu, 29-9-2021. https://t.co/Ucmf62HmdA
@AndamariUsmani Kepada para kepala sekolah saya persilakan untuk segera melakukan kegiatan pembelajaran tatap muka terbatas dengan syarat
wilayah sekolah tersebut minimal sudah pada level 3 PPKM dan vaksinasi Covid 1 Jajanan Manis Pasar Tebet
After all data is collected, the next stage is preprocessing. At this stage, several steps were taken to produce clean
text ready for use. Only the trained data will be cleaned. First, the terms were case folded to lowercase form such as
terms "BERITA", "BeRiTa", "BERITA", or "berita"; all those terms will become the term "berita". An example of
case-folding can be seen in table 2:
Table 2. Case Folding
Tweet Case Folding
Kanit Binmas Iptu Moh.Widayadi Sanani dan Panit Binmas Aiptu Edy kanit binmas iptu moh.widayadisanani dan panit binmas aiptu edy
Rusmadi berkoordinasi dengan Kepala Sekolah SMAN 1 Piyungan rusmadi berkoordinasi dengan kepala sekolah sman 1 piyungan
terkait dengan Pembelajaran Tatap Muka (PTM) yang akan dilaksanakan terkait dengan pembelajaran tatap muka (ptm) yang akan
pada Senin tanggal 4 Oktober 2021. dilaksanakan pada senin tanggal 4 oktober 2021
After terms were case folded, the Sentence will be tokenized. Each term's Sentence will be separated into an array
of terms. The splitting criteria were using white space. An example of tokenizing can be seen in table 3:
Table 3. Tokenized Process
Tweet Tokenizing
kanit binmas iptu moh.widayadi sanani dan panit binmas aiptu [“kanit”,”binmas”,”iptu”,”moh”,”widayadi”,”sanani”, “dan”, “panit”,
edy rusmadi berkoordinasi dengan kepala sekolahsman 1 piyungan “binmas”, “ aiptu”, “edy”, “rusmadi”, “berkoordinasi”, “dengan”,
terkait dengan pembelajaran tatap muka (ptm) yang akan dilaksanakan “kepala”, “sekolah”, “sman”, “1”, “piyungan”, “terkait”, “dengan”,
pada senin tanggal 4 oktober 2021. “pembelajaran”, “tatap”, “muka”, “ptm”, “yang”, ”akan”,
“dilaksanakan”, “pada”, “senin”, “tanggal”, “4”, “oktober”, “2021”]
The next step was removing stopwords. For this step, NLTK library was used, and the language was set to
"indonesian" to use a stopwords list from the Indonesian language. Table 4 shows the stopword removal step.
Table 4. Stopword Removal
Tweet Tokenizing
[“kanit”,”binmas”,”iptu”,”moh”,”widayadi”,”sanani”, “dan”, “panit”, [“kanit”,”binmas”,”iptu”,”moh”,”widayadi”,”sanani”, “panit”,

“binmas”, “ aiptu”, “edy”, “rusmadi”, “berkoordinasi”, “dengan”, “binmas”, “ aiptu”, “edy”, “rusmadi”, “berkoordinasi”, “kepala”,
“kepala”, “sekolah”, “sman”, “1”, “piyungan”, “terkait”, “dengan”, “sekolah”, “sman”, “1”, “piyungan”, “pembelajaran”, “tatap”,
“pembelajaran”, “tatap”, “muka”, “ptm”, “yang”, ”akan”, “muka”, “ptm”, “dilaksanakan”, “senin”, “tanggal”, “4”, “oktober”,
“dilaksanakan”, “pada”, “senin”, “tanggal”, “4”, “oktober”, “2021”] “2021”]
The next step was stemming. Stemming was done to create the same form of multiple terms containing suffixes or
prefixes. An example of stemming can be seen in the table 5:
Table 5. Stemming Step

Tweet Tokenizing
[“kanit”,”binmas”,”iptu”,”moh”,”widayadi”,”sanani”, “dan”, “panit”, [“kanit”,”binmas”,”iptu”,”moh”,”widayadi”,”sanani”, “panit”,

“binmas”, “ aiptu”, “edy”, “rusmadi”, “berkoordinasi”, “dengan”, “binmas”, “ aiptu”, “edy”, “rusmadi”, “koordinasi”, “kepala”,
“kepala”, “sekolah”, “sman”, “1”, “piyungan”, “terkait”, “dengan”, “sekolah”, “sman”, “1”, “piyungan”, “ajar”, “tatap”, “muka”, “ptm”,
“pembelajaran”, “tatap”, “muka”, “ptm”, “yang”, ”akan”, “laksana”, “senin”, “tanggal”, “4”, “oktober”, “2021”]
“dilaksanakan”, “pada”, “senin”, “tanggal”, “4”, “oktober”, “2021”]
All the training set data will be moved vectorized using TF-iDF. The test data won't be vectorized because it's
representing anonymous data to our model. After all, the data was vectorized, and all the training data was then trained
using KNN. As mentioned above, the K value is 3. The distance used is Euclidean distance. The model was then
evaluated using a confusion matrix. The confusion matrix can be seen in Figure 2.
Fig. 2. Confusion Matrix
Based on Figure 2, false positive (FP), false negative (FN), true positive (TP), and true negative (TN) for each class
can be seen in Table 6. From each value, we can calculate the value of the F1 score for each class, the Recall for each
class, and the accuracy of the model for the whole class.
From what is shown in Table 6, the Recall for all classes showed promising results. However, class neutral showed
the lowest value of Recall while having the most data for training. For precision, the lowest value shown from the
class is positive. The F1-score, the harmonic means of precision and Recall for each class, showed promising results
(above 70%). Figure 3 shows a clearer view of each metric class comparison.
Table 6. FP, FN, TP, and TN values for each class
Class Positive Class Neutral Class Negative

False Negative 27 13 12
False Positive 9 36 7
True Negative 208 139 203
True Positive 57 113 79
Recall Score 86.364% 75.839% 91.860%
Precision 67.857% 89.683% 86.813%
F1-Score 76.000% 82.182% 89.266%
Accuracy All Class 82.724%
Metric Value for Each Class

100.00% 89.68% 91.86%
86.36% 86.81% 89.27%
90.00% 82.18%
80.00% 76.00% 75.84%
67.86%
70.00%
60.00%
50.00%
40.00%
30.00%
20.00%
10.00%
0.00%
Class Positive Class Neutral Class Negative
Precision Recall F1-Score
Fig. 3. Metric Value for each class
The overall higher value of the hostile class can result from its high amount of data. While the neutral style has the
highest amount of data, Recall is the only metric that becomes the highest value. This may result from the neutral
class having the most data. Based on these findings, the highest amount of data doesn't necessarily let a class become
the highest in the case of sentiment analysis. Interestingly, the positive class, which was almost only half of the neutral
class, showed a higher value of precision than the neutral. While the false negative cases were higher in the positive
class, being the fewest data, the neutral class showed the least precision, which means that the false positive is much
higher than other classes. This also, being the result of the neutral class, is much higher than the others, making the
model misclassify the minor class into a neutral class. Future research may try to handle this class imbalance to
produce a more robust machine-learning model. Balancing the data may also improve the Recall of class positive and
negative
5. Conclusion
Research for sentiment analysis for online learning using KNN combined with TF-iDF vector on Twitter with
Bahasa Indonesia has been conducted. The result showed promising results with above 70% F1-score for each class.
Accuracy also showed 82%. With the new corpus data taken by random sampling of a certain number, having a time
label, then classifying it, and then calculating the amount of data classified for each class, we can predict the sentiment
at that time. The low accuracy in this study is due to the insufficient amount of data in each class; the solution that can
be done is to add the amount of data for each class and arrange it so that each class can be balanced, from these data
it can be seen that the number of positive class data is less than the other classes this is because during the pandemic
there was a lot of fear from parents of students about the transmission of Covid-19; This phenomenon can be seen
from the size of the class negative.
While the result seems good overall, several things can be improved. First, the positive class still suffers from less
data than the others. Oversampling can be used. Another is that comparison with contextual embeddings, such as
Sentence Bert or other transformer-based sentence embeddings, can improve the overall model performance.
428 Andrew
GozalGiovanni Gozal et
et al./ Procedia al. / Procedia
Computer Computer
Science Science
00 (2023) 227 (2023) 398–405
000–000 405
References
[1] O. B. Adedoyin and E. Soykan, "Covid-19 pandemic and online learning: the challenges and opportunities,"
https://doi.org/10.1080/10494820.2020.1813180, 2020, doi: 10.1080/10494820.2020.1813180.
[2] V. H. Valentino, H. S. Setiawan, M. T. Habibie, R. Ningsih, D. Katrina, and A. S. Putra, "Online And Offline Learning ComparisonIn The
New Normal Era," International Journal of Educational Research and Social Sciences (IJERSC), vol. 2, no. 2, pp. 449–455, May 2021, doi:
10.51601/IJERSC.V2I2.73.
[3] F. H. Rachman, Imamah, and B. S. Rintyarna, "Sentiment Analysis of Madura Tourism in New Normal Era using Text Blob and KNN with
Hyperparameter Tuning," 2021 International Seminar on Machine Learning, Optimization, and Data Science, ISMODE 2021, pp. 23–27,
2022, doi: 10.1109/ISMODE53584.2022.9742894.
[4] Y. Qi et al., "Sentiment analysis on customer satisfaction of digital payment in Indonesia: A comparative study using KNN and Naïve
Bayes," J Phys Conf Ser, vol. 1444, no. 1, p. 012034, Jan. 2020, doi: 10.1088/1742-6596/1444/1/012034.
[5] S. S. Istia and H. D. Purnomo, "Sentiment analysis of law enforcement performance using support vector machine and K-nearest neighbor,"
Proceedings - 2018 3rd International Conference on Information Technology, Information Systems and Electrical Engineering, ICITISEE
2018, pp. 84–89, Jul. 2018, doi: 10.1109/ICITISEE.2018.8720969.
[6] S. Kaur, G. Sikka, and L. K. Awasthi, "Sentiment Analysis Approach Based on N-gram and KNN Classifier," ICSCCC 2018 - 1st
International Conference on Secure Cyber Computing and Communications, pp. 13–16, Jul. 2018, doi: 10.1109/ICSCCC.2018.8703350.
[7] H. El Hannach and M. Benkhalifa, "Using Synonym and Definition WordNet Semantic relations for implicit aspect identification in
Sentiment Analysis," ACM International Conference Proceeding Series, vol. Part F148154, 2019, doi: 10.1145/3320326.3320406.
[8] B. Pooja and S. Jaswinder, "A Study on Classification Techniques based on Opinions," IOP Conf Ser Mater Sci Eng, vol. 1022, no. 1, p.
012091, Jan. 2021, doi: 10.1088/1757-899X/1022/1/012091.
[9] S. Assem and S. Alansary, "Sentiment Analysis From Subjectivity to (Im)Politeness Detection: Hate Speech From a Socio-Pragmatic
Perspective," 2022 20th International Conference on Language Engineering (ESOLEC), pp. 19–23, Jan. 2022, doi:
10.1109/ESOLEC54569.2022.10009298.
[10] S. Uddin, I. Haque, H. Lu, M. A. Moni, and E. Gide, "Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its
different variants for disease prediction," Sci Rep, vol. 12, no. 1, Dec. 2022, doi: 10.1038/S41598-022-10358-X.

1 s2.0 S1877050923017064 Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S1877050923017064 Main

Uploaded by

Copyright:

Available Formats

Available online at www.sciencedirect.

8th International Conference on Computer Science and Computational Intelligence

Sentiment analysis of the Indonesian community toward face-to-

© 2023 The Authors. Published by Elsevier B.V.

* Corresponding author. Tel.: +62-021-534-5830; fax: +62-021-535=0660.

1877-0509 © 2023 The Authors. Published by ELSEVIER B.V.

1877-0509 © 2023 The Authors. Published by Elsevier B.V.

2.1. Sentiment Analysis

2.2. K-Nearest Neighbor

Fig. 1. Research Methodology Flowchart

3.1. Data Collection

3.1.1. Splitting Data

3.1.2. Data Preprocessing

3.1.3. Data Labelling

3.1.4. Text Vectorization

3.1.5. Model Training

3.1.6. Model evaluation

4. Result and Discussion

Table 1. Dataset Scraped from Twitter

Table 2. Case Folding

Tweet Case Folding

Table 3. Tokenized Process

Table 4. Stopword Removal

[“kanit”,”binmas”,”iptu”,”moh”,”widayadi”,”sanani”, “dan”, “panit”, [“kanit”,”binmas”,”iptu”,”moh”,”widayadi”,”sanani”, “panit”,

Table 5. Stemming Step

[“kanit”,”binmas”,”iptu”,”moh”,”widayadi”,”sanani”, “dan”, “panit”, [“kanit”,”binmas”,”iptu”,”moh”,”widayadi”,”sanani”, “panit”,

Fig. 2. Confusion Matrix

Table 6. FP, FN, TP, and TN values for each class

Class Positive Class Neutral Class Negative

Metric Value for Each Class

Precision Recall F1-Score

Fig. 3. Metric Value for each class

You might also like