Welcome to Scribd!

Hate Speech Detection Task

Uploaded by

0% found this document useful (0 votes)

5 views1 page

This project involves building a hate speech detection classifier using PyTorch and BERT. It loads tweet text and labels from a hate speech dataset, explores the data, preprocesses it by handling class imbalance, and splits it into train and validation sets. It defines a BERT classifier model with frozen BERT layers and additional classification layers. The model is trained on batches of the train set for multiple epochs, with evaluation on the validation set after each epoch. Overall it showcases hate speech detection using PyTorch and pre-trained BERT.

Original Description:

Original Title

hate speech detection task

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

5 views1 page

Hate Speech Detection Task

Uploaded by

Nefavik

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 1

Search inside document

This project is a hate speech detection task, which involves automatically

determining whether a piece of text contains hateful content. The classifier for this
task was built using PyTorch and a pre-trained BERT model.

The project starts with setting up the GPU environment if available. Then, the
necessary libraries, such as `transformers`, are installed. The data is loaded from the
Hate Towards the Political Opponent Twitter Corpus Study of the 2020 US Elections
dataset. Only the 'text' and 'HOF' (label) columns are used for the task.

Next, the project proceeds with data exploration and visualization. Basic statistics
about the dataset, including the number of tweets in each class and a sample of the
dataset, are displayed. The distribution of tweets by class and the histogram of tweet
lengths are also visualized using matplotlib.

The data is preprocessed by mapping the labels to binary integers and handling
class imbalances by downsampling the majority class ('Non-Hateful'). The train
dataset is split into train and development sets for model evaluation.

The data is prepared by creating a `BERTDataset` class, which performs cleaning,

tokenization, and encoding of the tweets using the BERT tokenizer. The class also
stores the labels. Additionally, a collate function is defined to handle batch creation
and padding.

The project defines a BERT classifier model using the `BERTClassifier` class. It
consists of a BERT model, a linear layer for classification, and a dropout layer. The
BERT layers are frozen, and the model is moved to the available device (GPU or CPU).

The model is trained using the Adam optimizer, cross-entropy loss, and a specified
number of epochs. Training is performed in a loop, iterating over batches from the
train dataset. The model is put into training mode, and for each batch, the gradients
are cleared, data is moved to the device, forward pass is performed, loss is
computed, gradients are calculated, gradients are clipped to prevent exploding
gradients, and the parameters are updated.

After each training epoch, the model is evaluated on the validation set. The model is
put into evaluation mode, and the predictions and true labels are collected for further
evaluation.

Overall, this project showcases the implementation of a hate speech detection task
using PyTorch and a pre-trained BERT model, including data loading, preprocessing,
model building, training, and evaluation.

Fundamentals of Computing
Document3 pages
Fundamentals of Computing
Em en
0% (1)
Anomaly Detection in Social Networks Twitter Bot
Document11 pages
Anomaly Detection in Social Networks Twitter Bot
Mallikarjun patil
No ratings yet
Machine Learning With ML - Net and C# - VB - Net - CodeProject
Document17 pages
Machine Learning With ML - Net and C# - VB - Net - CodeProject
Gabriel Gomes
No ratings yet
Machine Learning Part: Domain Overview
Document20 pages
Machine Learning Part: Domain Overview
surya prakash
No ratings yet
Abhishek - Abhishek Bhattacharya
Document3 pages
Abhishek - Abhishek Bhattacharya
rajat.singh
No ratings yet
Sentiment Analysis of Twitter and Stock Market News: ENGR 400: Applied Machine Learning Fall 2020
Document3 pages
Sentiment Analysis of Twitter and Stock Market News: ENGR 400: Applied Machine Learning Fall 2020
Wyatt Steen
No ratings yet
Team Alacrity - Amazon ML Challenge 2023 - Text File
Document8 pages
Team Alacrity - Amazon ML Challenge 2023 - Text File
omkar sameer chaubal
No ratings yet
A Simple Guide On Using BERT For Binary Text Classification
Document18 pages
A Simple Guide On Using BERT For Binary Text Classification
sita devi
No ratings yet
Team Name - Codesmashers Team Members - Manmeet Singh Tuteja, Raghav Gupta
Document4 pages
Team Name - Codesmashers Team Members - Manmeet Singh Tuteja, Raghav Gupta
manmeet singh tuteja
No ratings yet
IRE Deliverable 3
Document7 pages
IRE Deliverable 3
Vaibhav Nautiyal
No ratings yet
IRJET
Document4 pages
IRJET
Selveswari J
No ratings yet
Wuy at Semeval-2020 Task 7: Combining Bert and Naive Bayes-Svm For Humor Assessment in Edited News Headlines
Document6 pages
Wuy at Semeval-2020 Task 7: Combining Bert and Naive Bayes-Svm For Humor Assessment in Edited News Headlines
Titi A
No ratings yet
Maneesha Nidigonda Verzeo Major Project
Document11 pages
Maneesha Nidigonda Verzeo Major Project
Maneesha Nidigonda
No ratings yet
PRACTICAL5
Document23 pages
PRACTICAL5
thundergamerz403
No ratings yet
Project Name Spam Email Detection 1
Document7 pages
Project Name Spam Email Detection 1
ayeshanaseem9999
No ratings yet
Nirmala P.HD
Document9 pages
Nirmala P.HD
narendran k
No ratings yet
Programming Assignment 2 - Decision Trees and Random Forests
Document2 pages
Programming Assignment 2 - Decision Trees and Random Forests
Y SAHITH
No ratings yet
Distortion and LSA
Document13 pages
Distortion and LSA
Marian Aldescu
No ratings yet
Develop A Program To Implement Data Preprocessing Using
Document19 pages
Develop A Program To Implement Data Preprocessing Using
Fucker Jamun
No ratings yet
Headline Detecting Fake News With M
Document3 pages
Headline Detecting Fake News With M
fanfictionmarvel453
No ratings yet
DP-Designing and Implementing
Document10 pages
DP-Designing and Implementing
Steven Doh
No ratings yet
Machine Learning Performance Evaluation Report
Document40 pages
Machine Learning Performance Evaluation Report
Peace Emmanuel
No ratings yet
Major Project PPTT
Document11 pages
Major Project PPTT
Ar
No ratings yet
Maneesha Nidigonda Major Project
Document11 pages
Maneesha Nidigonda Major Project
Maneesha Nidigonda
No ratings yet
Fake Phase3
Document14 pages
Fake Phase3
Imran S
No ratings yet
Prediction of Company Bankruptcy: Amlan Nag
Document16 pages
Prediction of Company Bankruptcy: Amlan Nag
Express Business Services
100% (2)
COMP90049 2021S1 A3-Spec
Document7 pages
COMP90049 2021S1 A3-Spec
Masud Zaman
No ratings yet
Himanshu Gupta Configuration Manual
Document16 pages
Himanshu Gupta Configuration Manual
Sheethal K. S
No ratings yet
Binary Image Classification Using Deep Learning From Scratch
Document7 pages
Binary Image Classification Using Deep Learning From Scratch
IJRASETPublications
No ratings yet
Emotion Detection-Final
Document24 pages
Emotion Detection-Final
Maha noor Zafar
No ratings yet
Template For The First Slide of PPT Presentation1
Document18 pages
Template For The First Slide of PPT Presentation1
Suman Sourav
No ratings yet
Module I Complete Notes
Document136 pages
Module I Complete Notes
Nitesh Kumar Sahu
No ratings yet
ML (Prac1)
Document12 pages
ML (Prac1)
dk9859164
No ratings yet
HTML Forms Built On User Trait Detection
Document16 pages
HTML Forms Built On User Trait Detection
saikiran
No ratings yet
Introduction To Logistics Regression.
Document4 pages
Introduction To Logistics Regression.
Vikram Choudhary
No ratings yet
13 Chapter 6 PSO GA DT
Document11 pages
13 Chapter 6 PSO GA DT
gomathi
No ratings yet
Project Occupancy Alfonso Vicente Aragues
Document18 pages
Project Occupancy Alfonso Vicente Aragues
Alfonso
No ratings yet
Cross Validation Thesis
Document5 pages
Cross Validation Thesis
afcnftqep
100% (3)
Sentiment Analysis: Under The Guidance of Prof Gur Saran Dept of Mathematics Dayalbagh Educational Institute
Document18 pages
Sentiment Analysis: Under The Guidance of Prof Gur Saran Dept of Mathematics Dayalbagh Educational Institute
Rahil Arshad
No ratings yet
License Plate Recognition
Document27 pages
License Plate Recognition
Jaskirat Singh
No ratings yet
Theano: A CPU and GPU Math Compiler in Python
Document7 pages
Theano: A CPU and GPU Math Compiler in Python
hachan
No ratings yet
FAQ - Python For Visualization-2 - Python For Data Science - Great Learning
Document7 pages
FAQ - Python For Visualization-2 - Python For Data Science - Great Learning
Suchi S Bahuguna
No ratings yet
2324 BigData Lab3
Document6 pages
2324 BigData Lab3
Elie Al Howayek
No ratings yet
Traffic Flow Prediction Using The METR-LA Traffic
Document8 pages
Traffic Flow Prediction Using The METR-LA Traffic
Mallikarjun patil
No ratings yet
Multi-Output Classification With Machine Learning
Document10 pages
Multi-Output Classification With Machine Learning
panigrahisuman7
No ratings yet
ML Problem Statements
Document4 pages
ML Problem Statements
deepakachu5114
No ratings yet
Author Profiling Using Semantic and Syntactic Features
Document12 pages
Author Profiling Using Semantic and Syntactic Features
Karthik Krishnamurthi
No ratings yet
Fake News Classification
Document8 pages
Fake News Classification
Anton Lushkin
No ratings yet
Item Response Theory and Modeling With Stata
Document13 pages
Item Response Theory and Modeling With Stata
j Algarishi
No ratings yet
Oomp Assignments For UoP
Document5 pages
Oomp Assignments For UoP
Vivek Agarwal
0% (1)
Data Analytics and Performance of Mobile Apps Using R Language
Document10 pages
Data Analytics and Performance of Mobile Apps Using R Language
Uma Mahesh
No ratings yet
Project Report
Document62 pages
Project Report
Pulkit Chauhan
No ratings yet
Major Project
Document17 pages
Major Project
RISHABH GIRI
No ratings yet
CSC 603 - Final Project
Document3 pages
CSC 603 - Final Project
bme.engineer.issa.mansour
No ratings yet
Machine Learning Project Car Price Prediction Algorithm
Document4 pages
Machine Learning Project Car Price Prediction Algorithm
Ruqaiya Ali
No ratings yet
CatBoost vs. Light GBM vs. XGBoost - by Alvira Swalin - Towards Data Science
Document10 pages
CatBoost vs. Light GBM vs. XGBoost - by Alvira Swalin - Towards Data Science
KrishanSingh
No ratings yet
Lab Manual Ds&Bdal
Document100 pages
Lab Manual Ds&Bdal
SEA110 Kshitij Bhosale
No ratings yet
ANN Final Exam
Document13 pages
ANN Final Exam
basit
100% (1)
Tutorial 7 Developing A Simple Image Classifier
Document11 pages
Tutorial 7 Developing A Simple Image Classifier
hassna ait ali
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet