You are on page 1of 6

Stance Detection of Political Tweets with

Transformer Architectures
Pranav Gunhal Aditya Bashyam Kelly Zhang Alexandra Koster
Homestead High School Irvington High School Brooklyn Technical High School Eleanor Roosevelt High School
Cupertino, CA Fremont, CA New York, NY New York, NY
pranav.gunhal@gmail.com adityabashyam05@gmail.com kellyzhang338@gmail.com alexkosternyc@gmail.com

Julianne Huang Neha Haresh Rudransh Singh Michael Lutz


Staten Island Technical High School The Lyceum School Bellarmine College Prep UC Berkeley
New York, NY Karachi, Sindh San Jose, CA Berkeley, CA
juliannehuang17@gmail.com nehaharesh17@gmail.com rudransh.singh18@gmail.com michaeljlutz@berkeley.edu

Abstract—The online actions and words of a person can reveal litical issues in a concise manner. As [15] notes, the exposure
their political sentiments and how they may vote at the polls. For to political sentiments online impacts the political behaviors of
decades, the dominant strategy of determining voter sentiment on a person. Previous works have revolutionized polling by using
policies relied on slow and often inaccurate polling. The creation
and subsequent popularity of numerous social media sites, Twitter and numerous Natural Language Processing models to
namely Twitter, has presented an opportunity for researchers to predict election outcomes. However, these models primarily
apply machine learning models to identify voter stances towards focus exclusively on national level elections, such as the pres-
relevant political issues. Stance detection is a sub-task of natural idential election, leaving out smaller policy-based elections
language processing that involves algorithmically determining the with considerable impact. These elections can often impact a
stance that a text contains towards a given topic. With recent
developments in NLP models and architectures, prior researchers political campaign in the long run, as they allow politicians to
have successfully trained stance detection models to predict understand their constituency better. Controversial policies on
the winning candidates in national-level elections. However, the local level elections commonly receive substantial real-time
viability of stance detection towards specific policies in city-level attention on Twitter, and data sets with strong support for
and state-level elections is relatively unexplored. In this paper, either side with neutral opinions on an issue can be utilized to
we train a novel transformer neural network architecture that
accurately classifies Twitter users’ stances towards Proposition create models that predict accurate results shown in the final
16 of California’s 2020 election. To that end, we created a elections.
novel annotated data set of all Tweets regarding Proposition Previous works, including a prior study conducted by Dar-
16. Because not all tweets were opinionated, we also trained a wish et. al [3], have presented quantitative and qualitative
model to filter out irrelevant and neutral tweets. Ultimately, we analysis of the top retweeted tweets, essentially those that went
achieved 82% overall accuracy. Moreover, our model accurately
predicted the result of the Prop 16 election. Our results show that viral, pertaining to the 2016 US presidential elections between
a stance detection model can provide a unique perspective to help a 3 month period leading up to the election. Analyzing the
politicians make decisions that represent their constituents. most viral tweets pertaining to the election typically expresses
Index Terms—policy-based elections, sentiment analysis, trans- the topics that garnered the most attention on Twitter. Shi et
former learning, stance detection, neural network, RoBERTa- al. [4] analyzed millions of tweets to predict public opinion
base
towards the Republican presidential primaries during the 2012
US Presidential elections, training a linear regression model
I. I NTRODUCTION
to produce results rivaling those of the polls. Budiharto et. al
For decades, politicians have used polls to determine a pop- [15] employed a similar approach, analyzing a large dataset
ulation’s views and feelings on their platform and campaigns, of tweets to accurately predict the outcome of the 2019
yet their polls are often ineffective and inefficient [13], [18]. Indonesian Presidential election. Other works improved the
The 2020 United States presidential election polls predicted a efficiency of such models using novel architectures, such as
close race between Joe Biden and Donald Trump, even though [2], [16], the former using Recursive Neural Networks in
the former ultimately won by a large margin [17]. However, political literature and the latter using Temporal Convolutional
almost $600 million was spent on Strategy and Research, Networks as a replacement for Long Short Term Memory
which includes polls [19]. With the surge of social media use models in the detection of fake news. A vast majority of such
in recent years, politicians have attempted to utilize data from studies, catering almost solely to candidate-based elections,
social sites, namely Twitter, to gather information to base their overlook more localized policy-based elections that would take
campaign strategies. With over 425 million [14] current users, into account Twitter data from a more specific demographic
the micro blogging site contains substantial data regarding po- group. For instance, studies such as [12] highlight that local
elections attract a higher amount of participation and interest TABLE I
from homeowners, the wealthy, and the elderly. It is important E XAMPLE OF DATA ANNOTATIONS
to note that such an audience would have a more prominent
Label Tweet
social media presence and would, thus, produce useful Twitter
data in large volumes. Yes “California Voters: PLEASE remember to vote
Over the past few years, transformer architectures [6] YES on Prop 16, which would ban touching
have become the dominant architecture for stance detec- chinas!”
tion tasks. [7] [2] applied Temporal Convolutional Networks
No “Replying to @asiansforaa If UC will continue
and transformer-based models, BERT [5] and XLNet, for to use 14 admissions. Why 16? Prop 16 makes
NLP tasks. The BERT embeddings had better performance more privilege. I don’t believe your smoke
than XLNet, a model based on Transformer-XL. The study bomb. 1 5”
concluded that TCN outperformed RNN networks and the Unrelated “NBA BUBBLE PLAY DALLS MAVS
BERT Independent models provided better training accuracy, PLAYER PROP DONCIC OVER 16.5
although it presented slight anomalous results and began to RBS/AST - 120 BOL LETS GET
THAT MONEY! #nba #NBABubble
overfit. Some studies have deduced, through comparing the #NBAISBACK #nbatwitter #makingyiumoney
performances of their prediction models that they create, that @Makingyoumoney1 1”
traditional approaches, like SVM and logistic regression, may
Comment “Replying to @MattBoxer94 What is prop 16?
perform the most optimal stance detection [7] for election You can tell me to google it if you don’t want
outcome predictions. On the contrary, other studies conclude to explain. I am just making dinner for my kids
that traditional approaches may only outperform initial deep so curious. 1 1”
learning trials because of the convenience of feature-based
methods. [8] Research with successful stance detection models
use deep learning architectures, such as neural networks, and
train the models diligently. [9]–[11] More current studies domain experts. Each tweet was labeled either ‘No’, ‘Yes’,
have supported that deep learning approaches, such as neu- ‘Unrelated’, or ‘Comment’. Tweets were labeled ‘Yes’ if
ral networks, and ensembling are effective and outperform they expressed support or appreciation for Prop 16, whether
traditional models. [25] reviews deep ensemble learning and that be with the use of hashtags, simple stance assertion, or
asserts that, while deep learning models may have issues due to sharing supporting media or links. Tweets were labeled ‘No’
computational costs and degradation problems, deep learning if they expressed disapproval or hostility towards Prop 16,
architectures have still demonstrated better performance as again whether that be with the use of hashtags, simple stance
opposed to shallow or traditional models. assertion, or sharing supporting media or links. Tweets were
Similar to the work done in [4] regarding how social media labeled ‘Comment’ if they included material that was relevant
was used to gauge public opinion, our work aims to use to Prop 16 but was neutral or simply a comment without
neural network architectures to determine dominant stances any position or stance assertion. Lastly, tweets were labeled
on specific localized policies so that pollers can develop ‘Unrelated’ if they contained material completely irrelevant
adequate campaigning strategies. More specifically, we seek to to California’s Prop 16. Table 1 shows a few sample labeled
determine whether a tweet is relevant and biased regarding a tweets.
political issue. In the context of an election, we aim to classify The annotators were equipped with sufficient knowledge
tweets are unrelated, related but neutral, related and negative, regarding California elections, and made sure to check the
or related and positive. content of each tweet carefully, including any hashtags, im-
The structure of this work is as follows. In Section II, ages, videos, or external links to ensure data was annotated
we present details about our data set and model, including accurately. More complex tweets made about the proposition,
the process through which they are implemented and their such as satirical comments, were analyzed thoroughly to
respective results. We then present our experimental results in determine their true stance. Each of the authors annotated
Section III. To conclude, in Section IV, we provide a detailed a random sample of 50 tweets from the data set that were
analysis of the results obtained from the former and examine not previously annotated by themselves, and compared their
the future applications of our work. annotations to those made by another author on the same data
set to verify the accuracy of the annotation using Cohen’s
II. M ETHODOLOGY Kappa calculation.
A. Data Collection: Overall, all seven annotators fully agreed on 91% of the
We used the Twitter API to gather a data set of 5,000 tweets tweets and disagreed on the remaining 9%. The inter-rater
relating to California’s 2020 Prop 16 election, wherein we used reliability, measured using Cohen’s Kappa, is 0.85, which
the following Twitter attributes: text, place, created at, user, signifies near perfect agreement. The ability to achieve a high
retweet count, favorite count, and entities. Then we com- score demonstrates both a good data set using the Twitter API
pleted data annotations manually to determine the relevance and a lack of bias amongst annotators.
and stance of 3,500 tweets, each of which were annotated by In addition, we augmented the data using Google Translate
to further increase the accuracy of the model. This method D. Relevancy Classifier
is highly effective in providing slightly changed versions of
The sentiment analysis of the tweets was performed in two
a data set, allowing for more training data [21]. A portion of
stages, one being the classification of the tweets on the basis
the tweets was randomly chosen and passed into the trans-
of relevancy. Because the data was collected using general
lator, which translated it to one of the following languages:
search terms from a large database, not all tweets collected
Arabic, Bulgarian, French, German, Greek, Hindi, Russian,
were relevant to the topics discussed. Unrelated tweets often
Simplified Chinese, Spanish, Swahili, Thai, Turkish, Urdu,
contained the words “Prop” or “16” in different contexts, or
and Vietnamese. The altered text was then translated back to
were found to be the result of spam bots. Meanwhile, related
English, providing a different, albeit similar, version of the
tweets consisted of positive, negative, and neutral sentiments
tweet.
about Prop 16 and were often paired with corresponding
hashtags. The model was trained on the data for 10 epochs
B. Pre-processing
with the AdamW optimizer and a training and test batch size
The tweets were pre-processed using a tokenizer before of 16.
being passed into the model. The tokenizer for the base model
we utilized was a byte-level BPE. Its representation of text E. Modular Stance Detection:
allows for complete retainment of a text in a compact manner,
while also not requiring out-of-vocabulary tokens. The related tweets were then used to train a stance detection
model to predict the user’s stance towards Prop 16. There
C. Models were three stances the tweets on Prop 16 could take, “Yes”
(supporting the proposition), “No” (against the proposition),
We used multiple pre-trained transformer learning models
and “Comment” (neutral on the proposition). Since the model
as the base model for this classifier. BERT was used as the pri-
detects the stance towards Prop 16, we dropped any data
mary baseline. Its revolutionary bidirectional capabilities allow
annotated as “Unrelated” to increase the accuracy of the model
for it to be pre-trained on Masked Language Modeling and
and avoid using data that would not discuss Prop 16. After
Next Sentence Prediction.This predicted a random selection
filtering through the data, we had approximately 2270 tweets
of inputs that had been replaced with a masked word, concur-
to train on.
rently learning higher-order distributional statistics, therefore
making it more useful for future finetuning [24]. Preliminary training revealed heavy biases in the training
data. About 65% of all tweets were labeled as negative, while
We then experimented with Roberta-base, which utilizes
20% was positive and 15% was neutral. Thus, training on this
dynamic masking in lieu of BERT’s static masking. Its
unbalanced data set lead to biased results, and a low F1 score.
vast training data consisted of BookCorpus, which contained
When the negative tweets were under sampled to balance the
11,038 books, English Wikipedia, over 124 million tweets,
data, the loss in quantity decreased the accuracy of the model,
and tens of millions of articles on CC-News [21]. RoBERTa
although the F1 score increased slightly. We thus found it
is optimized to include larger batches of data and removes
favorable to utilize the data augmentation methods mentioned
the need for Next Sentence Processing, which reduces the
previously to over sample both the comment and positive
performance of bases like BERT [20].
tweets, leading to a roughly equal ratio of all categories. This
XL-NET was also used in the finetuning of the model.
resulted in the model having both a high accuracy and a high
The model was trained using BookCorpus, English Wikipedia,
F1 score, due to the quality and quantity of the training data.
Giga5, ClueWeb, and Common Crawl, resulting in a combined
158 gigabytes of training data. Its algorithm rivals that of We first created two binary classification models, one to
BERT by a comparable margin [22], allowing for the data detect whether a tweet was neutral or not, and one to detect
to be trained using novel architecture. whether it was in favor of Prop 16 or not. If a tweet was
predicted to be not neutral (not a comment), it was passed
DistilBERT is trained on the same data as BERT, while also
in to the stance detection classifier. The neutrality detection
implementing an algorithm that runs 60% faster than the latter,
model did not perform as well as the stance detection model
with 95% of its accuracy. Its efficiency allows for a faster,
more robust training process; consequently, it was utilized in
the training of the model.
TABLE II
Trained on English Wikipedia, BookCorpus, OPENWEB- R ELEVANCY C LASSIFIER M ODEL
TEXT, content from Reddit, and a subset of Common-
Crawl (STORIES), DeBERTA improves on both BERT and Model Accuracy F1 score
RoBERTA through the implementation of two novel tech- BERT 0.982 0.990
niques – a disentangled attention mechanism and an enhanced RoBERTA 0.974 0.974
XLNET 0.947 0.949
mask decoder. The former encodes words and computes their Distilbert 0.965 0.966
attention weights using two vectors in relation to content and Deberta 0.956 0.956
relative position, and the latter makes use of absolute positions Baseline 0.917 0.751
in the decoding layer to assist in pre-training [23].
performed on average, partially due to the subtlety of neutrality TABLE III
in language. This led to low accuracy for the combined model. T HE ACCURACIES OF TRAINING DIFFERENT MODELS ON THE TRAIN DATA .
Another architecture we implemented was a ternary classi-
Model Accuracya Training lossa Joint accuracy
fication model, which predicted neutrality and stance simulta-
neously. The model required more training data than the two Relevancy and ternary classification
binary models, and was achieved using data augmentation. The ROBERTA 0.8408 0.0032 0.7880
XLNET 0.8372 0.0093 0.7848
ternary model performed significantly better with a balanced DISTILBERT 0.8121 0.0021 0.7627
train data set, and outperformed the two binary classification BERT 0.7979 0.0002 0.7500
models. DEBERTA 0.7496 0.7339 0.7072
The models mentioned above were all trained for 10 epochs Four way classification
on the train data set, with a weight decay value of 0.01, and ROBERTA 0.8212 0.0091 0.8212
a batch size of 8. XLNET 0.7990 0.0992 0.7990
BERT 0.7737 0.0990 0.7737
F. Quaternary Classifier DISTILBERT 0.7722 0.0961 0.7722
DEBERTA 0.7547 0.099 0.7547
In addition to creating numerous separate models for rele- Existing methods (no data augmentation)
vancy, neutrality, and stance detection, we also experimented
Baseline — — 0.5759
with the creation of one quaternary classification model. In
a
its essence, the model would function the same as the ternary Accuracy and final training loss for the independent model
classification model mentioned above, but would also attempt
to predict the relevancy of a tweet. When preliminary training
was executed, the accuracy and F1 scores were consistently The quaternary classification model had the highest accu-
low because of the imbalanced nature of the data set. When racy of all the architectures, however, with a final accuracy of
data augmentation was used to over sample unrelated, neutral, 0.8212. The model also utilizes ROBERTA-BASE. We found
and positive tweets to match the quantity of the negative that the model predicted outputs with a reasonable amount
tweets, the accuracy increased greatly, producing comparable of error according to the distribution of the test data set. The
accuracies to the modular, multi-model approach. All models quantity of predictions for each label roughly corresponds with
described above were trained on the augmented data set for the actual quantity, which shows the lack of noticeable bias in
varying epochs. the model.
ROBERTA may have outperformed the other models in both
III. R ESULTS model pipelines because of its unique training data set, which
We created numerous joint and separate models in this work, included over 124 million tweets. This likely contributed to the
and trained them on differing sets of data. model classifying the validation tweets better than the other
models.
A. Relevancy Detection
C. Election Prediction
We passed the data set through multiple models, of which
BERT outperformed the other models by a slight margin on the Upon examining the results from our best-performing
test data, which were not visible to the model during training. model, we were able to determine that Prop 16 would have
The maximum accuracy achieved was 0.982, by the BERT- failed in 2020. The vast majority of the predictions were
base model, which also had the highest F1 score of 0.990, against the proposition, and the remaining predictions were
as shown by Table 2. All models achieved higher accuracies split between unrelated, neutral, and positive tweets. Accord-
than the baseline, 0.917, and required relatively little training, ing to our model, the percentage of negative sentiment is
averaging 10 epochs for maximum performance. projected to be about 66%; this is only slightly higher than
the actual percentage of 57%. While this difference is not very
B. Stance Detection significant, it does imply a stronger algorithm bias towards
All stance detection models trained were able to exceed the negative tweets in this specific context.
baseline accuracy of 0.5759.
The ternary stance detection models greatly outperformed IV. C ONCLUSION AND FUTURE WORK
the combined binary comment and stance detection models. In this work, we aimed to use NLP techniques and trans-
Of these, ROBERTA-BASE had the highest accuracy, 0.8408, former architectures to apply to local election predictions
as illustrated by Table 3 After the ternary classification models to potentially be an adequate alternative to traditional polls.
were tested on the output of the relevancy classification model, Traditional polling has been proven to be costly, and often,
the former still had the highest joint accuracy of 0.7880. This ineffective and inaccurate. To achieve our goals, we made two
is slightly higher than the final accuracy of XLNET, which types of models and received suitable accuracy scores. The
has an accuracy of 0.7848. Due to the randomization of the relevancy classifier proved to be able to classify if a tweet is
train data set for all of the models, this minor difference in relevant to Prop 16 while the stance detection models could
accuracy can be disregarded. determine the opinions of Twitter users based on their tweets
regarding Prop 16. Our determination from the models that for traditional polling methods and if active social media users
Prop 16 would have failed was correct, as the proposition did are more representative of one general viewpoint or another.
not pass with roughly 7.2 million votes for Yes and 9.7 million Since our model was able to reasonably predict the outcome
votes for No (57% against), a margin similar to our model of the Prop 16 2020 election, we anticipate that it can be
predictions (66% against). Further research may be done as to applied to other policy-based elections. Local elections are
how exactly the under-representation of less privileged groups, held more often than and frequently receive less attention than
such as non-English language speakers and the less wealthy, national elections, and therefore applications of this research
affect prediction performance. and the extension of the models created in other similar
Social media is a powerful tool used to investigate how contexts could greatly impact policy makers and campaigns.
people interact with each other, but sentiment analysis is not
always accurate due to the intricacies of human language. In R EFERENCES
the future, as technology develops to identify such nuances [1] E. Hargittai, “Potential biases in big data: Omitted voices on
including sarcasm, emotion, and irony with greater accuracy, Social Media,” Sagejournals, 30-Jul-2018. [Online]. Available:
opinion mining, or sentiment analysis, will expand into larger https://journals.sagepub.com/doi/abs/10.1177/0894439318788322.
[Accessed: 17-Jul-2022].
societal applications. In this paper, we dove specifically into [2] K. Jain, F. Doshi, and L. Kurup, “Stance detection
how NLP can be used in the political world to determine using transformer architectures and temporal convolutional
the dominant stances on certain localized policies. However, networks,” SpringerLink, 28-Oct-2020. [Online]. Available:
https://link.springer.com/chapter/10.1007/978-981-15-4409-
opinion mining can be applied in numerous other industries, 540#chapter-info. [Accessed: 17-Jul-2022]
such as healthcare, technology, sports, and marketing. For [3] K. Darwish, W. Magdy, and T. Zanouda, “Trump vs.
instance, in the healthcare industry, Knowledge Discovery Hillary: What went viral during the 2016 US presidential
election,” SpringerLink, 03-Sep-2017. [Online]. Available:
and Data Mining is already taking on a prominent role of https://link.springer.com/chapter/10.1007/978-3-319-67217-510.
being able to identify fraud and abuse by identifying unusual [Accessed: 17-Jul-2022].
medical claims outside of the set norms. Their findings may [4] L. Shi, A. Agrawal, N. Agarwal, R. Garg, and J. Spoelstra, “Predicting
us primary elections with Twitter - Stanford University.” [Online]. Avail-
have broader implications towards different fields outside of able: https://snap.stanford.edu/social2012/papers/shi.pdf. [Accessed: 18-
healthcare, as detecting fraud and abuse is a common issue. Jul-2022].
Further research can be done on more use of NLP on common [5] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “[PDF]
Bert: Pre-training of deep bidirectional Transformers for language
field-specific issues. understanding: Semantic scholar,” Semantic Scholar, 24-May-2019.
Because we were able to predict the outcome of the 2020 [Online]. Available: https://www.semanticscholar.org/paper/BERT%3A-
Prop 16 election, should legislation seeking to replace the Pre-training-of-Deep-Bidirectional-for-Devlin-
Chang/df2b0e26d0599ce3e70df8a9da02e51594e0e992. [Accessed:
proposition be discussed in the future, our model will prove 29-Jul-2022].
effective in providing legislators with crucial information [6] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones,
regarding public opinion of the former. It is a possibility that A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention
Is All You Need,” NeurIPS Proceedings, 2017. [Online].
this may occur during the 2022 California elections as well. Available: https://proceedings.neurips.cc/paper/2017/hash/3f5ee2435
Our findings reveal that we can use NLP to determine the 47dee91fbd053c1c4a845aa-Abstract.html. [Accessed: 17-Jul-2022].
sentiment and relevancy of tweets. Furthermore, our research [7] T. Wolf et al., “Transformers: State-of-the-art natural language
processing,” ACL Anthology, 2020. [Online]. Available:
reveals that Twitter can be a useful tool in predicting lo- https://aclanthology.org/2020.emnlp-demos.6/. [Accessed: 29-Jul-2022].
calized elections with suitable accuracy. Sentiment analysis [8] D. Küçük and F. Can, “Stance detection: A survey,”
through social media holds potential to be implemented more ACM Digital Library, 06-Feb-2020. [Online]. Available:
https://dl.acm.org/doi/abs/10.1145/3369026. [Accessed: 17-Jul-2022].
frequently and can compete with traditional polling methods.
[9] W. Wei, X. Zhang, X. Liu, W. Chen, and T. Wang, “A specific
However, as stated previous, there is inherent bias in social convolutional neural network system for effective stance detection,”
media, especially in Twitter users, because it is mostly priv- ACL Anthology, 2016. [Online]. Available: https://aclanthology.org/S16-
ileged groups that use Twitter and so they are likely to be 1062/. [Accessed: 29-Jul-2022].
[10] Y. Zhou, A. I. Cristea, and L. Shi, “Connecting targets to
over represented in the Twitter data sets. In future works tweets: Semantic attention-based model for target-specific stance
we could look for ways to offset the user bias to make detection,” SpringerLink, 04-Oct-2017. [Online]. Available:
predictions that are more representative of the population https://link.springer.com/chapter/10.1007/978-3-319-68783-4-2.
[Accessed: 17-Jul-2022].
as a whole. Another potential change that may improve the [11] M. Umer, Z. Imtiaz, S. Ullah, A. Mehmood, G. S. Choi and B.
results of our research is in the data annotations; we had -W. On, “Fake news stance detection using Deep Learning Ar-
seven authors each independently annotating 500 rows of chitecture (CNN-LSTM),” IEEE Xplore, 2020. [Online]. Available:
https://ieeexplore.ieee.org/abstract/document/9178321. [Accessed: 17-
data and then crosschecking 50 rows each. Ideally multiple Jul-2022].
authors would review each tweet to ensure that each tweet [12] C. M. Burnett and V. Kogan, “The politics of potholes: Service quality
was being evaluated the same way. In the future, we could and retrospective voting in local elections: The Journal of Politics: Vol
79, no 1,” The Journal of Politics, 01-Jan-2017. [Online]. Available:
also apply our models to current political issues in order to https://www.journals.uchicago.edu/doi/10.1086/688736. [Accessed: 29-
gather voter sentiment through social media without traditional Jul-2022].
polling methods and use the results to predict the outcome [13] E. J. Dionne and T. E. Mann, “Polling public opinion:
The good, the bad, and the ugly,” Brookings, 01-Jun-2003.
of controversial policies. Further research can be done as to [Online]. Available: https://www.brookings.edu/articles/polling-public-
whether models on Twitter data sets are a reliable replacement opinion-he-good-the-bad-and-the-ugly/. [Accessed: 29-Jul-2022].
[14] J. Degenhard, “Twitter users in the World 2017-
2025,” Statista, 20-Jul-2021. [Online]. Available:
https://www.statista.com/forecasts/1146722/twitter-users-in-the-world.
[Accessed: 10-Aug-2022].
[15] W. Budiharto and M. Meiliana, “Prediction and analysis of Indonesia
presidential election from Twitter using sentiment analysis - Jour-
nal of Big Data,” SpringerLink, 19-Dec-2018. [Online]. Available:
https://link.springer.com/article/10.1186/s40537-018-0164-1. [Accessed:
29-Jul-2022].
[16] M. Iyyer, P. Enns, J. Boyd-Graber, and P. Resnik, “Political ideology
detection using recursive neural networks,” ACL Anthology, 2014.
[Online]. Available: https://aclanthology.org/P14-1105.pdf. [Accessed:
10-Aug-2022].
[17] S. Keeter, N. Hatley, A. Lau, and C. Kennedy, “What 2020’s
election poll errors tell us about the accuracy of issue polling,”
Pew Research Center Methods, 02-Mar-2021. [Online]. Available:
https://www.pewresearch.org/methods/2021/03/02/what-2020s-election-
poll-errors-tell-us-about-the-accuracy-of-issue-polling/. [Accessed:
29-Jul-2022].
[18] D. S. Hillygus, “The Evolution of Election Polling in the
United States ,” OUP Academic, 01-Dec-2011. [Online]. Available:
https://academic.oup.com/poq/article/75/5/962/1830219. [Accessed: 29-
Jul-2022].
[19] “Campaign Expenditures,” OpenSecrets. [Online]. Available:
https://www.opensecrets.org/campaign-expenditures. [Accessed: 29-Jul-
2022].
[20] Y. Liu, “Roberta: A robustly optimized Bert pretrain-
ing approach,” arXiv, 26-Jul-2019. [Online]. Available:
https://arxiv.org/abs/1907.11692. [Accessed: 29-Jul-2022].
[21] D. Šeputis, “Investigation of text data augmentation for trans-
former training via translation technique,” Vilnius University Open
Series, 2021. [Online]. Available: https://www.journals.vu.lt/open-
series/article/view/24036/23341. [Accessed: 07-Aug-2022].
[22] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov,
and Q. Le, “XLNet: Generalized autoregressive pre training for
Language Understanding,” arXiv, 02-Jan-2020. [Online]. Available:
https://arxiv.org/pdf/1906.08237.pdf%EF%BC%89. [Accessed: 07-Aug-
2022].
[23] P. He, X. Liu, J. Gao, and W. Chen, “DEBERTA: DECODING-
ENHANCED BERT WITH DISENTANGLED ATTENTION,” arXiv,
06-Oct-2021. [Online]. Available: https://arxiv.org/pdf/2006.03654.pdf.
[Accessed: 07-Aug-2022].
[24] K. Sinha, R. Jia, D. Hupkes, J. Pineau, A. Williams, and D.
Kiela, “Masked Language Modeling and the Distributional
Hypothesis,” arXiv, 09-Sep-2021. [Online]. Available:
https://arxiv.org/pdf/2104.06644v2.pdf. [Accessed: 07-Aug-2022].
[25] M.A. Ganaie, Minghui Hu, A.K. Malik, M. Tanveer, and P.N. Suganthan,
“Ensemble deep learning: A review,” arXiv, 06-Apr-2021. [Online].
Available: https://arxiv.org/abs/2104.02395. [Accessed: 11-Aug-2022].

You might also like