You are on page 1of 15

Online Social Networks and Media 19 (2020) 100096

Contents lists available at ScienceDirect

Online Social Networks and Media


journal homepage: www.journals.elsevier.com/online-social-networks-and-media

Hate and offensive speech detection on Arabic social media


Safa Alsafari *, a, b, Samira Sadaoui a, Malek Mouhoub a
a
University of Regina, Regina, Canada
b
University of Jeddah, Jeddah, Saudi Arabia

A R T I C L E I N F O A B S T R A C T

Keywords: We are witnessing an increasing proliferation of hate speech on social media targeting individuals for their
Hate speech protected characteristics. Our study aims to devise an effective Arabic hate and offensive speech detection
Social media framework to address this serious issue. First, we built a reliable Arabic textual corpus by crawling data from
Arabic corpus
Twitter using four robust extraction strategies that we implement based on four types of hate: religion, ethnicity,
Data extraction
Data annotation
nationality, and gender. Next, we label the corpus based on a three-hierarchical annotation scheme in which we
Feature extraction verify the inter annotation agreement to ensure ground truth at each level. Based on machine and deep learning
Multi-class classification techniques, we develop numerous two-class, three-class, and six-class classification models that we combine with
Deep learning a variety of feature extraction techniques, such as contextual word embeddings. Finally, we conduct an intensive
experiment to assess the performance of the different learned models and to examine the misclassification errors.
The performance results are very encouraging compared to prior hate and offensive speech studies carried out on
Arabic and other languages.

1. Introduction becomes vital.


Although some progress has been made for English hate speech
1.1. Scope detection, however, most of the research has primarily focused on
detecting abusive and offensive language, or only one specific type of
In the last decade, the systematic detection of hate speech and other hate speech, as discussed in the related work section. This subject still
offensive language has attracted the attention from many scholars due to appears to be at a very early stage of research for the Arabic dialects. To
the increasing proliferation of harmful and offensive content on social the best of our knowledge, our work is the first attempt to detect four
network sites. The majority of these sites have hate speech policies in types of Arabic hate speech, such as gender-based and nationality-based.
place that prohibit any posts that attack or threaten individuals or Furthermore, all of the existing corpora of Arabic offensive or hate
groups based on their protected characteristics, such as race, ethnicity, speech-language are multi-dialect, which means they contain texts of
religion, gender, and nationality. Nevertheless, filtering hate speech still different Arabic dialects, such as the modern standard Arabic, Gulf,
relies heavily on reporting misconduct on the one hand, and monitoring Egypt, and Levantine. The mixed language may negatively affect the
by moderators on the other hand. Manually monitoring the tremendous ability of the annotators, especially if they are native speakers of only
volume of content on social media is a laborious task. Consequently, one of the dialects. This drawback is observed in most of the past studies,
several research studies emerged in the hate speech detection domain as especially when annotators are selected via crowdsourcing.
well as competitions and workshops, such as TRAC 2019 [1], HateEval
2019 [2] and OffensEval 2019 [3], which clearly emphasize the growing 1.2. Contributions
significance of this subject. Hate speech detection is usually modelled as
a supervised classification problem where the learning algorithms are To overcome the weaknesses of previous studies, we have devised an
trained on datasets that comprise some form of abusive and hateful effective detection approach of hate and offensive speech. Hereafter, we
language. In this present article, we are particularly interested in Arabic highlight our main contributions:
Twitter because of the increasing rate of hateful and offensive content on
this platform. Therefore, the real-time detection of such content

* Corresponding author.
E-mail address: sba166@uregina.ca (S. Alsafari).

https://doi.org/10.1016/j.osnem.2020.100096
Received 21 March 2020; Received in revised form 22 June 2020; Accepted 18 August 2020
Available online 16 September 2020
2468-6964/© 2020 Elsevier B.V. All rights reserved.
Online Social Networks and Media, 19 (2020) 100096. doi:10.1016/j.osnem.2020.100096
Online Social Networks and Media, 19 (2020) 100096. doi:10.1016/j.osnem.2020.100096
Online Social Networks and Media, 19 (2020) 100096. doi:10.1016/j.osnem.2020.100096
Online Social Networks and Media, 19 (2020) 100096. doi:10.1016/j.osnem.2020.100096
Online Social Networks and Media, 19 (2020) 100096. doi:10.1016/j.osnem.2020.100096
Online Social Networks and Media, 19 (2020) 100096. doi:10.1016/j.osnem.2020.100096
Online Social Networks and Media, 19 (2020) 100096. doi:10.1016/j.osnem.2020.100096
Online Social Networks and Media, 19 (2020) 100096. doi:10.1016/j.osnem.2020.100096
Online Social Networks and Media, 19 (2020) 100096. doi:10.1016/j.osnem.2020.100096
Online Social Networks and Media, 19 (2020) 100096. doi:10.1016/j.osnem.2020.100096
Online Social Networks and Media, 19 (2020) 100096. doi:10.1016/j.osnem.2020.100096
Online Social Networks and Media, 19 (2020) 100096. doi:10.1016/j.osnem.2020.100096
Online Social Networks and Media, 19 (2020) 100096. doi:10.1016/j.osnem.2020.100096
Online Social Networks and Media, 19 (2020) 100096. doi:10.1016/j.osnem.2020.100096

You might also like