You are on page 1of 5

Chapter.

INTRODUCTION
In recent years ,we see exponential growth of social media such as
Face book, Twitter and You tube has revolutionized communication and content
publishing, but is also increasingly exploited for the propagation of Hate,
Offensive and Profane speeches. The anonymity and mobility afforded by such
media has made the breeding and spread of hate speeches eventually leading to
hate crime effortless in a virtual landscape beyond the realms of traditional law
enforcement.

The term ‘hate speeches’ was formally defined as ‘any


communication that disparages a person or a group on the basis of some
characteristics (to be referred to as types of hate or hate classes) such as race,
colour, ethnicity, gender, sexual orientation, nationality, religion, or other
characteristics’.

The term ‘Offensive language’ is a crime that is charged when


someone uses foul or offensive language. It is most commonly used either where a
person has verbally abused police, or along with other more serious charges. The
offence of Offensive Language is contained in section 4A of the Summary
Offenses Act 1988 which states: “A person must not use offensive language in or
near, or within hearing from, a public place or a school.”.

The term ‘Profanity’ is socially offensive language, which may also


be called ‘cursing’, or ‘swearing’ (British English), ‘cuss words’ (American
English vernacular), ‘swear words’, ‘bad words’, or ‘expletives’. Used in this
sense, profanity is language that is generally considered by certain parts of a
culture to be strongly impolite, rude, or offensive. It can show a debasement of
someone or something, or be considered as an expression of strong feeling towards
something.

1.1 MOTIVATION

Building effective counter measures for online Hate, Offensive and


Profane speeches requires as the first step, identifying and tracking Hate,
Offensive and Profane speeches online. For years, social media companies such as
Twitter, Facebook, and YouTube have been investing hundreds of millions of
Rupees every year on this task, but are still being criticised for not doing enough.
This is largely because such efforts are primarily based on manual moderation to
identify and delete offensive materials. The process is labour intensive, time
consuming, and not sustainable or scalable in reality.

A large number of research has been conducted in recent years to


develop automatic methods for Hate, Offensive and Profane speeches detection in
the social media domain. These typically employ semantic content analysis
techniques built on Natural Language Processing (NLP) and Machine Learning
(ML) methods, both of which are core pillars of the Semantic Web research. The
task typically involves classifying textual content into non-hate or hateful, in which
case it may also identify the types of the Hate, Offensive and Profane speeches.
Although current methods have reported promising results, we notice that their
evaluations are largely biased towards detecting content that is non-hate, as
opposed to detecting and classifying real hateful content. A limited number of
studies have shown that, for example, state of the art methods that detect sexism
messages can only obtain an F1 of between 15 and 60 percentage points lower than
detecting non-hate messages. These results suggest that it is much harder to detect
hateful content and their types than non-hate1. However, from a practical point of
view, we argue that the ability to correctly (Precision) and thoroughly (Recall)
detect and identify specific types of hate speeches is more desirable. For example,
social media companies need to flag up hateful content for moderation, while law
enforcement need to identify hateful messages and their nature as forensic
evidence.

We were concerned with the task of detecting; identifying and


analyzing the spread of Hate, Offensive and Profane speeches sentiments in the
social media.

Address concerns on children’s access to offensive content over


Internet, administrators of social media often manually review online contents to
detect and delete offensive materials. However, the manual review tasks of
identifying offensive contents are labor intensive, time consuming, and thus not
sustainable and scalable in reality. Some automatic content filtering software
packages, such as Appen and Internet Security Suite, have been developed to
detect and filter online offensive contents. Most of them simply blocked webpages
and paragraphs that contained dirty words. These word-based approaches not only
affect the readability and usability of web sites, but also fail to identify subtle
offensive messages. For example, under these conventional approaches, the
sentence “you are such a crying baby” will not be identified as offensive content,
because none of its words is included in general offensive lexicons. In addition, the
false positive rate of these word-based detection approaches is often high, due to
the word ambiguity problem, i.e., the same word can have very different meanings
in different contexts.

Pornographic language refers to the portrayal of explicit sexual


subject matter for the purposes of sexual arousal and erotic satisfaction. offensive
language includes any communication outside the law that disparages a person or a
group on the basis of some characteristics such as race, color, ethnicity, gender,
sexual orientation, nationality, and religion. All of these are generally immoral and
harmful for adolescents’ mental health.

1.2 STATEMENT OF THE PROBLEM


In HASOOC we break down given content into four
classes(HATE,OFFENSE,PROFANE and NONE) taking the type and target of
statements.

Offensive language identification: In this we are interested in the identification


of offensive posts and posts containing any form of (untargeted) profanity. In this
there are 4 categories in which the given statement could be classified.

Hate: if statement contain hate words which disparages a person or a group on the
basis of some characteristic such as race, colour, gender, nationality, religion, or
other characteristics.

Offensive: If statements contains offensive language or a targeted (veiled or direct)


offense. To sum up this category includes insults, threats.
Profane: if statement contains words which are strongly impolite, rude, offensive,
or be considered as an expression of strong feeling towards something.

1.3 FLOW OF WORK

Identification of statement into target classes is done by following


steps

1) Data Extraction: In this step we extracted data from datasets to data frames

2) Data Cleaning: In this step we take the required data in required format from
data frames by following processes

 Removing special symbols


 Removing stop words
 Convert acquired data into tokens

3) Sentiment Analysis: In this step we classify the data weather it is an positive


context statement or an negative context statement

4) Label Encoding: In this step we label the statements in given data so that we
labels which are only used in machine learning

5) Machine Learning Classifier: In this step we use an classifier to classify the


data into required target classes
 Splitting an data
 Training an Model
 Predicting target classes by model
6) Result Analysis: In this step we check the results acquired by predicting target
classes using machine learning classifier
 Building confusion matrix
 Precision and recall
1.4Organization of Report
This report structure as follows

Chapter2: Is about how we came to known about the flow of work by referring
different documents.

Chapter3: Is about how we are going to complete the problem.

Chapter4: Is about dataset given, language used and implementation of methods


to rectify the problem.

Chapter5:Is about Is Model can be accepted or rejected by considering different


parameters.

Chapter6: concludes this work and discusses future usages.

You might also like