Professional Documents
Culture Documents
Presented By:
Waheed Abbas
FA17-RCS-013
Supervisor:
Dr. Rao Muhammad Adeel Nawab
Co-Supervisor:
Mr. Muhammad Sharjeel
2
Introduction
Research Focus
Literature Review
Agenda Toxic Comment Classification
Of Roman Urdu Text
Problem Statement
Research Methodology
Estimated Time Table
Toxic Comment Classification Of Roman Urdu Text 3
Task Description
Introduction
Toxic Comment Classification Of Roman Urdu Text 5
We are living in the era of technology which brings us many platforms (social
media and discussion groups) where we not only share our personal lives but
also engage in discussion and get opinions on many matters.
66%
Seen Harassment
41% Personally
Online Experienced It
Pew Research Center report 2017
Toxic Comment Classification Of Roman Urdu Text 7
Dunyia me bohat kamina chawal insan dakhy magar yar tumhrai bat hi
2 kuch or ha lanti...
3 Yeh MQM waley kuttay ki dum ki tarah hain. Yeh kabi theak nai ho saktay.
Botni k mar q nai jaty ye roz koi na koi drama ho raha hota ha... kon c
4 manhoos ghaere te jb to PTI main shamil hua tha.
Pak army Pak ISI zindabad. Hamin apni ISI or army pr Allah k bad bharosa
1 hy Inshallah ab wohe ho ga jo Pak chahy ga.
2 Pagal kar doge hume kisi din tum Hansa Hansa ke yaar...
Non-Toxic
Communication in
Online Community
Non-Toxic
Chat
Application
Sentiment
Analysis
10
Research Focus
Toxic Comment Classification Of Roman Urdu Text 11
Research Focus
Literature Review
Toxic Comment Classification Of Roman Urdu Text 13
Literature Review
Topic Year Dataset Technique Evaluation
Accuracy:
CNN – 0.912 %
TCC
Convolutional Neural Networks for Toxic CNN, kNN, LDA, NB, kNN – 0.697 %
2018 Competition
Comment Classification SVM LDA – 0.808 %
Kaggle dataset
NB – 0.719 %
SVM – 0.811 %
Accuracy:
TCC
CNN – 0.981 %
Challenges for Toxic Comment Classification: An Competition CNN, LSTM,
2018 LSTM – 0.980 %
In-Depth Error Analysis Kaggle dataset Bidirectional GRU
GRU – 0.983 %
LR – 0.975 %
F1 Score:
Predictive Embeddings for Hate Speech Twitter HATE LR, GRU, LR - 0.85 %
2018
Detection on Twitter dataset TWEM (Proposed) GRU – 0.89 %
TWEM – 0.92 %
Toxic Comment Classification Of Roman Urdu Text 14
Literature Review
Topic Year Dataset Technique Evaluation
Accuracy:
100K High LR (n-gram, word-
LR Word – 94.6 %
quality human gram),
Ex machina: Personal attacks seen at scale 2017 LR Char – 96.1 %
annotated MLP (n-gram, word-
MLP Word – 95.2 %
dataset gram)
MLP Char – 95.9 %
Twitter tweets
totalling 6655
(Racism 91, F1 Score:
Using convolutional neural networks to classify
2017 Sexism 946, CNN, LR (n-gran) CNN – 0.782 %
hate-speech Both 18, LR – 0.738 %
Non hate 5600)
Toxic Comment Classification Of Roman Urdu Text 15
Problem Statement
Toxic Comment Classification Of Roman Urdu Text 17
Problem Statement
Research Methodology
Toxic Comment Classification Of Roman Urdu Text 19
Research Methodology
Start End
Dataset
Estimated Timetable
Toxic Comment Classification Of Roman Urdu Text 22
Estimated Timetable
Bibliography
Toxic Comment Classification Of Roman Urdu Text 24
Bibliography
A H. Hosseini, S. Kannan, B. Zhang, and R. Poovendran, “Deceiving Google’s Perspective API Built for Detecting Toxic Comments,” arXiv Prepr.
arXiv1702.08138, 2017.
Z. Waseem and D. Hovy, “Hateful symbols or hateful people? predictive features for hate speech detection on twitter,” in Proceedings of the NAACL
student research workshop, 2016, pp. 88–93.
E. Wulczyn, N. Thain, and L. Dixon, “Ex machina: Personal attacks seen at scale,” in Proceedings of the 26th International Conference on World Wide
Web, 2017, pp. 1391–1399.
P. Badjatiya, S. Gupta, M. Gupta, and V. Varma, “Deep learning for hate speech detection in tweets,” in Proceedings of the 26th International
Conference on World Wide Web Companion, 2017, pp. 759–760.
H. Zhong et al., “Content-Driven Detection of Cyberbullying on the Instagram Social Network.,” in IJCAI, 2016, pp. 3952–3958.
J. H. Park and P. Fung, “One-step and two-step classification for abusive language detection on twitter,” arXiv Prepr. arXiv1706.01206, 2017.
Y. Chen, Y. Zhou, S. Zhu, and H. Xu, “Detecting offensive language in social media to protect adolescent online safety,” in Privacy, Security, Risk and
Trust (PASSAT), 2012 International Conference on and 2012 International Confernece on Social Computing (SocialCom), 2012, pp. 71–80.
S. V Georgakopoulos, S. K. Tasoulis, A. G. Vrahatis, and V. P. Plagianakos, “Convolutional Neural Networks for Toxic Comment Classification,” arXiv Prepr.
arXiv1802.09957, 2018.
Toxic Comment Classification Of Roman Urdu Text 25
Bibliography
T. Davidson, D. Warmsley, M. Macy, and I. Weber, “Automated hate speech detection and the problem of offensive language,” arXiv Prepr.
arXiv1703.04009, 2017.
R. Kshirsagar, T. Cukuvac, K. McKeown, and S. McGregor, “Predictive Embeddings for Hate Speech Detection on Twitter,” arXiv Prepr. arXiv1809.10644,
2018.
J. Golbeck et al., “A large labeled corpus for online harassment research,” in Proceedings of the 2017 ACM on Web Science Conference, 2017, pp. 229–
233.
B. Gambäck and U. K. Sikdar, “Using convolutional neural networks to classify hate-speech,” in Proceedings of the First Workshop on Abusive Language
Online, 2017, pp. 85–90.
Z. Waseem, “Are you a racist or am i seeing things? annotator influence on hate speech detection on twitter,” in Proceedings of the first workshop on
NLP and computational social science, 2016, pp. 138–142.
B. van Aken, J. Risch, R. Krestel, and A. Löser, “Challenges for Toxic Comment Classification: An In-Depth Error Analysis,” arXiv Prepr. arXiv1809.07572,
2018.
26
Thank You
Email: waheed0332@gmail.com
Contact: +923157022503