You are on page 1of 54

A MINI PROJECT REPORT

On
DETECTION OF CYBERBULLYING ON SOCIAL
MEDIA USING MACHINE LEARNING
Submitted by,

Ettaveni Ashish 20J41A0521


Chintawar Vamshika 20J41A0516
Hansraj Saini 19J41A05K9
Munugala Varshit 20J41A0540
in partial fulfillment of the requirements for the award of the degree
of
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING

Under the Guidance of


Dr. B.HARI KRISHNA
Associate Professor, Computer Science and Engineering

COMPUTER SCIENCE AND ENGINEERING


MALLA REDDY ENGINEERING COLLEGE (AUTONOMOUS)
(An UGC Autonomous Institution, Approved by AICTE, New Delhi & Affiliated to JNTUH,
Hyderabad) Maisammaguda, Secunderabad, Telangana, India 500100

DECEMBER-2023
MALLA REDDY ENGINEERING COLLEGE
(AUTONOMOUS)
Maisammaguda, Secunderabad, Telangana, India 500100

BONAFIDE CERTIFICATE

This is to certify that this mini project work entitled “DETECTION OF


CYBERBULLYING ON SOCIAL MEDIA USING MACHINE
LEARNING”, submitted by E.ASHISH(20J41A0521),
CH.VAMSHIKA(20J41A0516), HANSRAJ SAINI(19J41A05K9),
M.VARSHIT(20J41A0540) to Malla Reddy Engineering College
affiliated to JNTUH, Hyderabad in partial fulfillment for the award of
Bachelor of Technology in COMPUTER SCIENCE AND
ENGINEERING is a bonafide record of project work carried out under
my/our supervision during the academic year 2023 – 2024 and that this
work has not been submitted elsewhere for a degree.

SIGNATURE SIGNATURE
Dr. B. HARI KRISHNA Dr. P. S. R. C. MURTY
SUPERVISOR HOD
Associate Professor Professor
Computer Science Engineering Computer Science Engineering
Malla Reddy Engineering College, Malla Reddy Engineering College,
Secunderabad, 500 100 Secunderabad, 500 100

Submitted for Mini Project viva-voce examination held on _________

INTERNAL EXAMINER EXTERNAL EXAMINER

i
MALLA REDDY ENGINEERING COLLEGE
Maisammaguda, Secunderabad, Telangana, India 500100

ACKNOWLEDGEMENT
We express our sincere thanks to our Principal. Dr. A. Ramaswami Reddy, who took
keen interest and encouraged us in every effort during the project work.

We express our heartfelt thanks to our Dr. P .S .R .C Murty, Professor and Head,
Department of Computer Science and Engineering, for his kind attention and valuable
guidance throughout the project work.

We are thankful to our Project Coordinator Dr. Arun Kumar Kandru , Associate
Professor , Department of Computer Science and Engineering, for his cooperation
during the project work .

We are extremely thankful to our Project Guide Dr. B. Hari Krishna, for his constant
guidance and support in completing the project work.

We also thank all the teaching and non-teaching staff of the Department for their
cooperation during the project work.

ETTAVENI ASHISH -20J41A0521


CHINTAWAR VAMSHIKA -20J41A0516
HANSRAJ SAINI -19J41A05K9
MUNUGALA VARSHIT -20J41A0540

ii
ABSTRACT

One of the most harmful consequences of social media is the rise of cyberbullying,
which tends to be more sinister than traditional bullying given that online records
typically live on the internet for quite a long time and are hard to control. In this paper,
we present a three-phase algorithm, called BullyNet, for detecting cyberbullies on
Twitter social network. We exploit bullying tendencies by proposing a robust method
for constructing a cyberbullying signed network. We analyze tweets to determine their
relation to cyberbullying, while considering the context in which the tweets exist in
order to optimize their bullying score. We also propose a centrality measure to detect
cyberbullies from a cyberbullying signed network, and we show that it outperforms
other existing measures. We experimented on a dataset of 5.6 million tweets and our
results shows that the proposed approach can detect cyberbullies with high accuracy,
while being scalable with respect to the number of tweets.

Keywords:
Cyberbullying , Protection, Security, Hate, offend, malicious activities, Trolling,
Vulnerable, Social media ,Internet, Harass, Mistreat, Machine-learning.

iii
TABLE OF CONTENTS

ABSTRACT iii
LIST OF FIGURES vii
LIST OF AND ABBREVIATIONS viii

Chapter Description Page No.


1 INTRODUCTION 1-3
2 LITERATURE SURVEY 4-6
3 SYSTEM ANALYSIS 7-9
3.1. Existing System 7
3.2. Existing System Disadvantages 8
3.3. Proposed System 8
3.4. Proposed System Advantages 9
4 SYSTEM STUDY 10-11
5.1. Feasibility Study 10

5.2. Economic Feasibility 10

5.3. Technical Feasibility 10

4.4. Social Feasibility 11

5 SYSTEM DESIGN 12-17


5.1. Architecture Diagram 12
5.1.1. Input Design 13
5.1.2. Output Design 13
5.2. UML Diagrams 14
5.3. Use case Diagram. 15
5.4. Class Diagram 16
5.5. Sequence Diagram 17

iv
6 SYSTEM REQUIREMENTS 18
6.1. Hardware Requirements 18
6.2. Software Requirements 18

7 SOFTWARE ENIVIRONMENT 19-28


7.1. Python 19
7.1.1 Interactive Mode Programming 19
7.1.2 Script Mode Programming 20
7.2. Python Features 21
7.3. Django 22
8 IMPLEMENTATION 29-34
8.1. Modules 29
8.1.1. Service Provider 29
8.1.2. View and authorize user. 29
8.1.3. Remote user 29
8.2. Algorithms 30
8.2.1. Navie Bayes 30
8.2.2. SVM 31
8.2.3. Logistic Regression Classifiers 31
8.2.4. Decision Tree Classifiers. 32
8.3. Input and Output Design 33
9 SAMPLE OUTPUT 35-39
9.1. Homepage 35
9.2. Admin Login 35
9.3. Algorithm Accuracy 36
9.4. Tested Accuracy in Bar Chart 36
9.5. Trained Set Accuracy Results 37
9.6. Tweets Message Prediction Type Ratio 37
9.7. Line Chart Of Message Prediction Type 38

v
9.8. Pie Chart Of Message Prediction Type 38
9.9.User Interface 39
9.10. Sample Output 39

10 SYSTEM TESTING 40-42


10.1.Types of Tests 40
10.1.1. Unit Testing 40
10.1.2. Integration Testing 40
10.1.3. Functional Testing 41
10.1.4. System Testing 41
10.1.5. White Box Testing 41
10.1.6 Black Box Testing 41
10.2. User Acceptance Testing 42
10.2.1. Output Testing 42
10.3. Testing Strategy 42
11 CONCLUSION 43

REFERENCES 44-45

vi
LIST OF FIGURES
FIGURE NO. TITLE PAGE NO.

5.1 Architecture Design 12

5.3 Use Case Diagram 15

5.4 Class Diagram 16

5.5 Sequence Diagram 17

7.3.1 MVT Model 25

7.3.2 Django Flow 25

9.1 Homepage 35

9.2 Admin login 35

9.3 Algorithm Accuracy 36

9.4 Tested Accuracy in bar chart 36

9.5 Trained set Accuracy Results 37

9.6 Tweets Message prediction Type Ratio 37

9.7 Line Chart Of Message Prediction Type 38

9.8 Pie Chart Of Message Prediction Type 38

9.9 User Interface 39

9.10 Sample Output 39

vii
LIST OF ABBREVIATIONS

EBoW - Embeddings-Enhanced Bag-of-Words

MVC - Model-View- Controller

ORM - Object Relational mapping

XSS - Cross-Site Scripting

CSRF - Cross-Site Request Forgery

- Signed Network
SN

viii
CHAPTER 1

INTRODUCTION

The Internet has created never before seen opportunities for human interaction and
socialization. In the past decade, social media, in particular, has had a popularity explosion.
From Myspace to Facebook, Twitter, Flickr, and Instagram, people are connecting and
interacting in a way that was previously impossible. The widespread usage of social media
across people of all ages created a vast amount of data for several research topics, including
recommender systems [1], link predictions [2], visualization, and analysis of social
networks [3].

While the growth of social media has created an excellent platform for
communications and information sharing, it has also created a new platform for malicious
activities such as spamming [4], trolling [5], and cyber bullying [6]. According to the Cyber
bullying Research Center (CRC) [7], cyber bullying occurs when someone uses the
technology to send messages to harass, mistreat or threaten a person or a group. Unlike
traditional bullying where aggression is a short and temporary face to- face occurrence,
cyber bullying contains hurtful messages which are present online for a long time. These
messages can be accessed worldwide and are often irrevocable. Laws about cyber bullying
and how it is handled differ from one place to another. For example, in the United States,
the majority of the state’s incorporate cyberbullying into their bullying laws, and cyber
bullying is considered a criminal offense in most of them [8].

Popular social media platforms such as Face book and Twitter are very
vulnerable to cyber bullying due to the popularity of these social media sites and the
anonymity that the internet offers to the perpetrators. Although strict laws exist to punish
cyber bullying, there are very less tools available to effectively combat cyber bullying.
Social media platforms provide users with the option to self-report abusive behavior and
content in addition to providing tools to deal with bullying. For example, Twitter has
features that include locking accounts for a brief period of time or banning the accounts
when the behavior becomes unacceptable. The body of work produced by the research
community with regards to cyber bullying in social networks also needs to be expanded to
get better insights and help develop effective tools and techniques to tackle the issue.

1
To identify cyber bullies in social media, we first need to understand how social
media can be modeled. The common way of modeling relationship in social psychology
[9] is to represent it as a signed graph with a positive edge corresponding to the good intent
and negative edge corresponds to malicious intent between people. Using the signed graph,
we model the Twitter social network as a signed network to represent users’ behavior [10]
where nodes correspond to users, directed edges correspond to communications and/or
relations between the users with assigned weight in the range.
[-1, 1], as illustrated in Figure 1.

Definition 1: A signed social network (SSN) is a directed, weighted graph G =


(V; E; W), where V is the set of users and E _ V _ V is the set of edges with an edge weight
w 2 W in the range of [-1,1].

Mining social media networks to determine cyber bullies imposes several


challenges and concerns. First, it is typically hard to accurately interpret user’s intentions
and meanings in social media based merely on their messages (e.g. posts, tweets,
comments) which are typically short, use slang languages, or may include multimedia
contents such as pictures and videos. For example, Twitter limits its users’ messages to 140
characters, which could be a mix of text, slangs, emojis, and gifs. As a result, it is hard to
determine the opinion expressed by a message correctly. For this we utilize sentiment
analysis [11], [12] to determine whether the user’s attitude towards other users are positive,
negative, or neutral. Second, bullying could be hard to detect if the bully chooses to
disguise it through techniques such as sarcasm or passive aggression. In this situation, a
single text (message) cannot determine the user’s intention. So, we collect the entire
conversation between two or more users to identify the context in which the user attitude
exists. Third, the large size and dynamic and complex structure of social media networks
makes it challenging to identify cyber bullies. For example, on Twitter, hundreds of
millions of tweets are sent every day on the social network platform. In this case we
construct the social network as a graph and assign value based on the maliciousness of the
user. Because the network analysis reduces the complex relationship between the users to
the simple existence of nodes and edges [10] There are several works in the literature
concerning detecting malicious users from unsigned networks with positive edge weights,
including community detection [13], node classification [14] and link prediction [2]. On
the other hand, methods that analyze signed social networks are scarce [15].

2
In this paper, we study the problem of cyber bullying in social media in an attempt to
answer the following research question: Can tweet contexts (conversations) help improve
the detection of cyber bullying in Twitter? Our intuition is that each tweet should be
evaluated not only based on its contents, but also based on the context in which it exists.
We call such a context a conversation, which is a set of tweets between two or more people
exchanging information about a certain subject. Thus, our solution consists of three parts.
First, for each conversation, a conversation graph is generated based on the sentiment and
bullying words in the tweets. Second, we compute the bullying score for each pair of users
in a conversation graph, and then combine all graphs to create an SSN called bullying
signed network (B). The inclusion of negative links can bring out information that would
otherwise be missed with only positive links [16]. Finally, we propose a centrality measure
called attitude & merit (A&M) to detect bullying users from the signed network KB. Our
main contributions are organized as follows:

1) Collected, preprocessed and labelled the Twitter dataset.


2)Proposed a novel efficient algorithm for detecting cyber bullies on
Twitter.
a) Built conversation.
b) Constructed Bullying Signed Network.
c) Proposed Attitude and Merit Centrality.
3) Experimented on 5.6 million tweets collected over 6 months. The results show that
our approach can detect cyber bullies with high accuracy, while being scalable with respect
to the number of tweets.

3
CHAPTER 2

LITERATURE SURVEY
1. M. Di Capua, et al. [1] raises the unsupervised how to develop an online bullying model
based on a combination of features, based on traditional textual elements and other "social
features". Features were divided into 4 categories as Syntactic features, Semantic features,
Sentiment features, and Community features. The author has used the Growing
Hierarchical Self Map editing network (GHSOM), with 50 x grid 50 neurons and 20
elements as the insertion layer. M. Di Capua, and others used an integration algorithm k-
means to separate the input database and GHSOM in the Form spring database. The effects
of this hybrid the unsupervised way surpasses the previous one results. The author then
checked the YouTube database at 3 p.m. Different Machine Learning Models: Naive Bayes
Classifier, Decision Tree Classifier (C4.5), and Support Vector Machine (SVM) with
Linear Kernel. It was saw that the combined effects of hate speech turned around so that
we have lower accuracy in the YouTube database compared to Form Spring tests, as in the
text analysis and syntactical features work differently in on both sides. When this hybrid
method is used in Twitter Database, led to weak memory and F1 Score. The model
proposed by the authors can also be improved and used in building constructive mitigation
applications cyberbullying problems.

2. J. Yadav, et al. [2] suggests a new approach to this the discovery of internet
cyberbullying on social media in using a BERT model with a single line neural and was
tested on the Form spring forum and Wikipedia Database. The proposed model provided
performance 98% accuracy of spring Form databases and 96% accuracy in a relatively
comprehensive Wikipedia database previously used models. The proposed model provided
better Wikipedia database results due to its size g without the need for excessive sampling
while I The spring data form requires multiple samples.

3. R. R. Dalvi, et al. [3] suggests a way to do this detect and prevent online exploitation on
Twitter using Classified supervised machine learning algorithms. In this study, the live
Twitter API is used for compilation tweets and data sets. The proposed model tests both
Support Vector Machine and Naive Bayes on the data sets are collected. To remove a
feature, use it TFIDF vectorizer. The results show that it is accurate of an online bullying
model based on Vector Support. The machine is about 71.25% better than Naive Bayes
was almost 52.75%.

4
4. Trana R.E., et al. [4] The goal was to design a machine learning model to minimize
special events including text extracted from image memes. The author include a site that
contains approximately 19,000 text views that have been published on YouTube. This
study discusses the operation of three learning machines equipment, Uninformed Bayes,
Support Vector The machine, as well as the convolutional neural network used in on the
YouTube website and compare the results with existing details of the Form. They do not
write continuously investigate cyber bullying algorithms sub-sections within the YouTube
website. Naive Bayes beat SVM and CNN in the next four categories: race, nationality,
politics, and general. SVM passed well with the inexperienced Naïve Bayes again CNN is
in the same gender group, with all three algorithms showing equal performance with the
middle body group accuracy. The results of this study provided inaccurate data used to
distinguish between incidents of abuse and non-violence. Future work can focus on the
construction of a two-part separation system used to test text taken from photos to see that
the YouTube website provides a better context for aggression related collections.

5. N. Tsapatsoulis, et al. [5] detailed review of introduced cyberbullying on Twitter. The


importance of identifying the various abusers on Twitter is provided. Kwe paper, various
practical steps required the development of an effective and efficient application for
Internet traffic detection is well defined. I style involved in data classification and recording
platforms, machine learning models and feature types, and model studies using such tools
explained. This paper will serve as the first step in the process project on acquisition of
cyberbullying technology using machine reading.

6. G. A. León-Paredes et al. [6] they described the development of an online bullying


detection model is used Native Language Processing (NLP) and Mechanics Reading (ML).
Spanish Cyberbullying Prevention The system (SPC) was developed by installing the
machine learning strategies Naïve Bayes, Support Vector Machine, and Logistic
Regression. Database used this study was posted on Twitter. Plurality 93% accuracy was
achieved with e-ISSN: 2582-5208 International Research Journal of Modernization in
Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed
International Journal ) Volume:04/Issue:05/May-2022 Impact Factor- 6.752
www.irjmets.com www.irjmets.com @International Research Journal of Modernization in
Engineering, Technology and Science [4528] the help of third parties’ techniques used.
Charges of online exploitation were found with the help of this system presented the
accuracy of 80% to 91% on average. Stemming and lemmatization techniques in NLP can

5
be used continuously to increase system accuracy. Such a model can and used for adoption
in English and local languages if possible.

7. P. K. Roy, et al. [7] details about creating a request for hate speech on Twitter via the
help of a deep neural convolutional network. Machine learning algorithms such as Logistic
Regression (LR), Random Forest (RF), Naïve Bayes (NB), Support Vector Machine
(SVM), Decision Tree (DT), Gradient Boosting (GB), and K-nearby Neighbors (KNN)
have it used to identify tweets related to hate speech in Twitter and features have been
removed using tf-idf process. The leading ML model was SVM, but it managed to predict
53% hate speech tweets on a 3: 1 database to be tested train. The reason behind the low
forecast scale was unequal data. The model is based on the prediction of hate speech tweets.
Advanced learning-based methods on the Convolutional Neural Network (CNN), Long-
Term Memory (LSTM), and its Content LSTM combinations (CLSTM) have the same
effects as separate distributed database. 10 times cross confirmation was used in
conjunction with the proposed DCNN model once you got a very good rate of memory. It
was 0.88 hate speech and 0.99 non-hate speech. Test results confirmed that the k-fold
opposite verification process is a better resolution with unequal data. In the future, the the
current database can be expanded for better performance accuracy.

8. S. M. Kargutkar, et al. [8] had proposed a program to give the characters twice for
cyberbullying. The system uses Convolutional Neural Network (CNN) and Keras content
testing as appropriate strategies then providing them.

6
CHAPTER 3

SYSTEM ANALYSIS

3.1 EXISTING SYSTEM

The first method of determining bullying messages was done using a combination of text-
based analytics and a mix of text and user features. Zhao et al. [18] proposed a text-based
Embeddings-Enhanced Bag-of-Words (EBoW) model that utilizes a concatenation of
bullying features, bag-of-words, and latent semantic features to obtain a final
representation, which is then passed through a classifier to identify cyberbullies.

Xu et al. [21] used textual information to identify emotions in bullying traces, as


opposed to determining whether or not a message was bullying. Singh et al. [19] proposed
a probabilistic socio-textual information fusion for cyberbullying detection. This fusion
uses social network features derived from a 1.5 ego network and textual features, such as
density of bad words and part-of-speech-tags. Hossein Mardi et al. [20] used images and
text to detect cyberbullying incidents. The text and image features were gathered from
media sessions containing images and the corresponding comments, which was then fed
into various classifiers. Chen [25] proposed a novel method in identifying cyberbullies
within a multi-modal context. To understand cyberbullying Kao et al. [26] proposed a
framework by studying social role detection. By using words and comments, temporal
characteristics, and social information of a session as well as peer influence Cheng et al.
[27], [28] proposed frameworks for detecting cyberbullies.

The second method was aimed at identifying the person behind the cyberbullying
incidents. Squicciarini et al. [22] used Myspace data to create a graph, which integrated
user, textual, and network features. This graph was used to detect cyberbullies and predict
the spreading of bullying behavior through node classification. Galan-García et al. [23]
used supervised machine learning to detect the real users behind troll profiles on Twitter
and demonstrated the technique in a real case of cyberbullying. In a recent paper on
aggression and bullying in Twitter, Chatzakou et al. [24] found cyberbullies and aggressors
using user, text, and network-based features.

7
3.2 EXSISTING SYSTEM DISADVANTAGES

❖ The system is less effective due to lack of Constructing bullying signed network.
❖ The system isn’t effective due to lack of training large scale datasets.

3.3 PROPOSED SYSTEM


In the proposed system, the system studies the problem of cyberbullying in social media in
an attempt to answer the following research question: Can tweet contexts (conversations)
help improve the detection of cyberbullying in Twitter? Our intuition is that each tweet
should be evaluated not only based on its contents, but also based on the context in which
it exists. The system calls such a context a conversation, which is a set of tweets between
two or more people exchanging information about a certain subject. Thus, our solution
consists of three parts. First, for each conversation, a conversation graph is generated based
on the sentiment and bullying words in the tweets. Second, we compute the bullying score
for each pair of users in a conversation graph, and then combine all graphs to create an
SSN called bullying signed network (B). The inclusion of negative links can bring out
information that would otherwise be missed with only positive links [16]. Finally, we
propose a centrality measure called attitude & merit (A&M) to detect bullying users from
the signed network B.

Our main contributions are organized as follows:

1) Collected, preprocessed and labelled the Twitter dataset.

2) Proposed a novel efficient algorithm for detecting cyberbullies on Twitter.

a) Built conversation.

b) Constructed Bullying Signed Network.

c) Proposed Attitude and Merit Centrality.

3) Experimented on 5.6 million tweets collected over 6 months. The results show that our
approach can detect cyberbullies with high accuracy, while being scalable with respect to
the number of tweets.

8
3.4 PROPOSED SYSTEM ADVANTAGES

❖ The system is more effective due to the presence of Conversation Graph Generation
Algorithm, Bullying Signed Network Generation Algorithm, and Bully Finding
Algorithm.
❖ The system is more effective due to the techniques to analyze large number of datasets.

9
CHAPTER 4

SYSTEM STUDY

4.1 FEASIBILITY STUDY

The feasibility of the project is analyzed in this phase and the business proposal is put
forth with a very general plan for the project and some cost estimates. During system
analysis the feasibility study of the proposed system is to be carried out. This is to ensure
that the proposed system is not a burden to the company. For feasibility analysis, some
understanding of the major requirements for the system is essential.

Three key considerations involved in the feasibility analysis are.

 ECONOMICAL FEASIBILITY

 TECHNICAL FEASIBILITY

 SOCIAL FEASIBILITY

4.2 ECONOMIC FEASIBILITY

This study is carried out to check the economic impact that the system will have
on the organization. The amount of funds that the company can pour into the research and
development of the system is limited. The expenditure must be justified. Thus, the
developed system as well within the budget and this was achieved because most of the
technologies used are freely available. Only the customized products had to be purchased.

4.3 TECHNICAL FEASIBILITY

This study is carried out to check the technical feasibility, that is, the technical
requirements of the system. Any system developed must not have a high demand for the
available technical resources. This will lead to high demand for the available technical
resources. This will lead to high demands being placed on the client. The developed system
must have a modest requirement, as only minimal or null changes are required for
implementing this system.

10
4.4 SOCIAL FEASIBILITY

The aspect of study is to check the level of acceptance of the system by the user.
This includes the process of training the user to use the system efficiently. The user must
not feel threatened by the system, instead must accept it as a necessity. The level of
acceptance by the users solely depends on the methods that are employed to educate the
user about the system and to make him familiar with it. His level of confidence must be
raised so that he is also able to make some constructive criticism, which is welcomed, as
he is the final user of the system.

11
CHAPTER 5

SYSTEM DESIGN

5.1 ARCHITECTURE DESIGN

FIGURE 5.1:ARCHITECTURE DIAGRAM

12
5.1.1 INPUT DESIGN

Input Design plays a vital role in the life cycle of software development, it requires very
careful attention of developers. The input design is to feed data to the application as
accurately as possible. So inputs are supposed to be designed effectively so that the errors
occurring while feeding are minimized. According to Software Engineering Concepts, the
input forms or screens are designed to provide to have a validation control over the input
limit, range and other related validations.

This system has input screens in almost all the modules. Error messages are
developed to alert the user whenever he commits some mistakes and guides him in the right
way so that invalid entries are not made. Let us see deeply about this under module design.

Input design is the process of converting the user created input into a computer-
based format. The goal of the input design is to make the data entry logical and free from
errors. The error is in the input are controlled by the input design. The application has been
developed in user-friendly manner. The forms have been designed in such a way during
the processing the cursor is placed in the position where must be entered. The user is also
provided with in an option to select an appropriate input from various alternatives related
to the field in certain cases.

Validations are required for each data entered. Whenever a user enters an
erroneous data, error message is displayed, and the user can move on to the subsequent
pages after completing all the entries in the current page.

5.1.2 OUTPUT DESIGN


The Output from the computer is required to mainly create an efficient method
of communication within the company primarily among the project leader and his team
members, in other words, the administrator and the clients. The output of VPN is the system
which allows the project leader to manage his clients in terms of creating new clients and
assigning new projects to them, maintaining a record of the project validity and providing
folder level access to each client on the user side depending on the projects allotted to him.
After completion of a project, a new project may be assigned to the client. User
authentication procedures are maintained at the initial stages itself. A new user may be

13
created by the administrator himself or a user can himself register as a new user but the
task of assigning projects and validating a new user rest with the administrator only.

The application starts running when it is executed for the first time. The server
has to be started and then the internet explorer in used as the browser. The project will run
on the local area network so the server machine will serve as the administrator while the
other connected systems can act as the clients. The developed system is highly user friendly
and can be easily understood by anyone using it even for the first time.

5.2 UML DIAGRAMS


UML stands for Unified Modeling Language. UML is a standardized general-purpose
modeling language in the field of object-oriented software engineering. The standard is
managed, and was created by, the Object Management Group.

The goal is for UML to become a common language for creating models of object-
oriented computer software. In its current form UML is comprised of two major
components: a Meta-model and a notation. In the future, some form of method or process
may also be added to; or associated with, UML.

The Unified Modeling Language is a standard language for specifying,


Visualization, Constructing and documenting the artifacts of software system, as well as
for business modeling and other non-software systems.

The UML represents a collection of best engineering practices that have proven
successful in the modeling of large and complex systems.

The UML is a very important part of developing object-oriented software and the
software development process. UML uses mostly graphical notations to express the design
of software projects.

14
5.3 USECASE DIAGRAM

FIGURE 5.3: USE CASE DIAGRAM

15
5.4 CLASS DIAGRAM

FIGURE 5.4:CLASS DIAGRAM

16
5.5 SEQUENCE DIAGRAM

FIGURE 5.5: SEQUENCE DIAGRAM

17
CHAPTER 6

SYSTEM REQUIREMENTS

6.1 HARDWARE REQUIREMENTS


➢ Processor - Pentium -IV
➢ RAM - 4 GB (min)
➢ Hard Disk - 20 GB
➢ Keyboard - Standard Windows Keyboard
➢ Mouse - Two or Three Button Mouse
➢ Monitor - SVGA

6.2 SOFTWARE REQUIREMENTS


➢ Operating System : Windows 7 Ultimate
➢ Coding Language : Python
➢ Front-End : Python
➢ Back-End : Django-ORM
➢ Designing : Html, CSS, JavaScript.
➢ Data Base : MySQL (WAMP Server)

18
CHAPTER 7

SOFTWARE ENVIRONMENT

7.1 PYTHON

Python is a high-level, interpreted, interactive and object-oriented scripting language.


Python is designed to be highly readable. It uses English keywords frequently whereas
other languages use punctuation, and it has fewer syntactical constructions than other
languages.

• Python is Interpreted: Python is processed at runtime by the interpreter. You do


not need to compile your program before executing it. This is similar to PERL and
PHP.

• Python is Interactive: You can actually sit at a Python prompt and interact with
the interpreter directly to write your programs.

• Python is Object-Oriented: Python supports Object-Oriented style or technique


of programming that encapsulates code within objects.

Python is a Beginner's Language: Python is a great language for beginner-level


programmers and supports the development of a wide range of applications from simple
text processing to WWW browsers to games.

7.1.1 Interactive Mode Programming

Invoking the interpreter without passing a script file as a parameter brings up the following
prompt −

$ python

Python 2.4.3 (#1, Nov 11, 2010, 13:34:43) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on
linux2

Type "help", "copyright", "credits" or "license" for more information.

>>>

19
Type the following text at the Python prompt and press the Enter −

>>> print "Hello, Python!"

If you are running new version of Python, then you would need to use print statement with
parenthesis as in print ("Hello, Python!");. However, in Python version 2.4.3, this produces
the following result −

Hello, Python!

7.1.2 Script Mode Programming

Invoking the interpreter with a script parameter begins execution of the script and continues
until the script is finished. When the script is finished, the interpreter is no longer active.

Let us write a simple Python program in a script. Python files have extension .py. Type the
following source code in a test.py file −

Live Demo:

print "Hello, Python!"

We assume that you have Python interpreter set in PATH variable. Now, try to run this
program as follows −

$ python test.py

This produces the following result −

Hello, Python!

Let us try another way to execute a Python script. Here is the modified test.py file −

Live Demo

#!/usr/bin/python

print "Hello, Python!"

We assume that you have Python interpreter available in /usr/bin directory. Now, try to run
this program as follows −

$ chmod +x test.py # This is to make file executable.

20
$./test.py

This produces the following result −

Hello, Python!

7.2 Python Features


Python's features include:

• Easy-to-learn: Python has few keywords, simple structure, and a clearly defined
syntax. This allows the student to pick up the language quickly.

• Easy-to-read: Python code is more clearly defined and visible to the eyes.

• Easy-to-maintain: Python's source code is fairly easy-to-maintain.

• A broad standard library: Python's bulk of the library is very portable and cross-
platform compatible on UNIX, Windows, and Macintosh

• Interactive Mode: Python has support for an interactive mode which allows
interactive testing and debugging of snippets of code.

• Portable: Python can run on a wide variety of hardware platforms and has the
same interface on all platforms.

• Extendable: You can add low-level modules to the Python interpreter. These
modules enable programmers to add to or customize their tools to be more
efficient.

• Databases: Python provides interfaces to all major commercial databases.

• GUI Programming: Python supports GUI applications that can be created and
ported to many system calls, libraries and windows systems, such as Windows
MFC, Macintosh, and the X Window system of Unix.

• Scalable: Python provides a better structure and support for large programs than
shell scripting.

21
Python has a big list of good features:

• It supports functional and structured programming methods as well as OOP.

• It can be used as a scripting language or can be compiled to byte-code for building


large applications.

• It provides very high-level dynamic data types and supports dynamic type
checking.

• IT supports automatic garbage collection.

• It can be easily integrated with C, C++, COM, ActiveX, CORBA, and Java

7.3. DJANGO

Django: A Web Framework for Python

Django is a high-level web framework that enables developers to build web applications
rapidly by providing a robust foundation of tools, patterns, and components. It follows the
"batteries included" philosophy, which means that it includes many built-in features and
utilities, allowing developers to focus on creating unique and functional applications rather
than reinventing the wheel.

Key Features:

1. MVC Architecture: Django follows the Model-View-Controller (MVC) architectural


pattern, where models represent the data structures, views handle the presentation logic,
and controllers manage the flow of data between models and views. However, in Django,
the term "controller" is referred to as "views."

2. ORM (Object-Relational Mapping): Django provides a powerful Object-Relational


Mapping system that abstracts the database interactions, allowing developers to define
models as Python classes. These classes map to database tables, and operations on objects
are automatically translated to SQL queries.

22
3. Admin Interface: Django's admin panel is an automatic, customizable interface for
managing the application's data. It enables developers to create, read, update, and delete
records without having to write extensive code for CRUD operations.

4. URL Routing: Django's URL routing system helps in mapping URLs to specific views.
This decouples the URL structure from the actual views and enhances maintainability.

5. Template Engine: Django's template engine allows developers to create dynamic


HTML pages by embedding Python code within HTML templates. This separation of
concerns enhances code readability and reusability.

6. Form Handling: Django provides a robust form handling system that simplifies form
creation, validation, and processing. It includes built-in security features to protect against
common web vulnerabilities like Cross-Site Scripting (XSS) and Cross-Site Request
Forgery (CSRF).

7. Security: Django emphasizes security by default. It offers protection against SQL


injection, clickjacking, and more. It also provides user authentication and authorization
features.

8. REST Framework: Django's REST framework simplifies the creation of RESTful


APIs. It offers tools for serialization, authentication, permissions, and viewsets, making it
easier to develop API endpoints.

Components:

1. Models: Models define the structure of the application's data and how it interacts with
the database. Django's ORM enables developers to define models using Python classes and
automatically generates the necessary database schema.

2. Views: Views handle the logic behind rendering web pages. They process data, interact
with models, and pass data to templates for rendering.

3. Templates: Templates are responsible for generating HTML dynamically. They allow
developers to mix HTML with template tags and variables, enabling the creation of
dynamic and data-driven pages.

23
4. URLs and Routing: The `urls.py` file defines the URL routing configuration for the
application. It maps URLs to views, allowing users to access different parts of the
application.

5. Forms: Django's form handling system simplifies user input validation and data
processing. It helps in creating and handling HTML forms while providing security against
common web vulnerabilities.

6. Admin: Django's admin interface is a powerful tool for managing application data. It is
automatically generated based on the defined models and allows for easy data management
without writing custom admin views.

7. Middleware: Middleware is a series of hooks that process requests and responses


globally before reaching the view. It can be used for tasks like authentication, security
checks, and more.

Django's rich ecosystem, extensive documentation, and active community make it a go-to
choose for building a wide range of web applications, from simple prototypes to complex,
high-traffic sites.

Django is a high-level Python Web framework that encourages rapid development and
clean, pragmatic design. Built by experienced developers, it takes care of much of the
hassle of Web development, so you can focus on writing your app without needing to
reinvent the wheel. It’s free and open source.

Django's primary goal is to ease the creation of complex, database-driven websites. Django
emphasizes reusability and "pluggability" of components, rapid development, and the
principle of don't repeat yourself. Python is used throughout, even for settings files and
data models.

Django also provides an optional administrative create, read, update and delete interface
that is generated dynamically through introspection and configured via admin models.

24
FIGURE 7.3.1: MVT Model

Create a Project

FIGURE 7.3.2: Django flow

25
Whether you are on Windows or Linux, just get a terminal or a cmd prompt and navigate
to the place you want your project to be created, then use this code −

$ Django-admin start project myproject

This will create a "myproject" folder with the following structure −

myproject/

manage.py

myproject/

__init__.py

settings.py

urls.py

wsgi.py

The Project Structure

The “myproject” folder is just your project container, it actually contains two elements −

manage.py − This file is kind of your project local Django-admin for interacting with your
project via command line (start the development server, sync db...). To get a full list of
command accessible via manage.py you can use the code −

$ python manage.py help

The “myproject” subfolder − This folder is the actual python package of your project. It
contains four files −

__init__.py − Just for python, treat this folder as package.

settings.py − As the name indicates, your project settings.

urls.py − All links of your project and the function to call. A kind of ToC of your project.

wsgi.py − If you need to deploy your project over WSGI.

26
Setting Up Your Project

Your project is set up in the subfolder myproject/settings.py. Following are some important
options you might need to set −

DEBUG = True

This option lets you set if your project is in debug mode or not. Debug mode lets you get
more information about your project's error. Never set it to ‘True’ for a live project.
However, this has to be set to ‘True’ if you want the Django light server to serve static
files. Do it only in the development mode.

DATABASES = {

'default': {

'ENGINE': 'django.db.backends.sqlite3',

'NAME': 'database.sql',

'USER': '',

'PASSWORD': '',

'HOST': '',

'PORT': '',

The database is set in the ‘Database’ dictionary. The example above is for SQLite engine.
As stated earlier, Django also supports -

MySQL (django.db.backends.mysql)

27
PostGreSQL (django.db.backends.postgresql_psycopg2)

Oracle (django.db.backends.oracle) and NoSQL DB

MongoDB (django_mongodb_engine)

Before setting up any new engine, make sure you have the correct db driver installed.

INSTALLED_APPS = (

'django.contrib.admin',

'django.contrib.auth',

'django.contrib.contenttypes',

'django.contrib.sessions',

'django.contrib.messages',

'django.contrib.staticfiles',

'myapp',

28
CHAPTER 8

IMPLEMENTATION

8.1 MODULES
➢ Service Provider
➢ View and Authorize Users
➢ Remote User

8.1.1 Service Provider


In this module, the Service Provider has to login by using a valid username and password.
After login successful he can do some operations such as Login, Train and Test Data Sets,
View Trained and Tested Accuracy in Bar Chart, View Trained and Tested Accuracy
Results, View Tweets Message Predict Type, Find Tweets Message Prediction Type Ratio,
Download Tweet Message Prediction Type, View Tweet Message Prediction Type Ratio
Results, View All Remote Users.

8.1.2 View and Authorize Users

In this module, the admin can view the list of users who all registered. In this,
the admin can view the user’s details such as, username, email, address and admin
authorize the users.

8.1.3 Remote User


In this module, there are ‘n’ number of users that are present. Users should
register before doing any operations. Once the user registers, their details will be stored in
the database. After registration successfully, he has to login by using authorized username
and password. Once Login is successful user will do some operations like REGISTER
AND LOGIN, PREDICT TWEETS MESSAGE TYPE, VIEW YOUR PROFILE.

29
8.2 ALGORITHMS

8.2.1 NAVIE BAYES


The naive bayes approach is a supervised learning method which is based on a simplistic
hypothesis: it assumes that the presence (or absence) of a particular feature of a class is
unrelated to the presence (or absence) of any other feature .
Yet, despite this, it appears robust and efficient. Its performance is comparable to other
supervised learning techniques. Various reasons have been advanced in the literature. In
this tutorial, we highlight an explanation based on the representation bias. The naive bayes
classifier is a linear classifier, as well as linear discriminant analysis, logistic regression or
linear SVM (support vector machine). The difference lies on the method of estimating the
parameters of the classifier (the learning bias).

While the Naive Bayes classifier is widely used in the research world, it is not
widespread among practitioners which want to obtain usable results. On the one hand, the
researchers found it is especially easy to program and implement it, its parameters are easy
to estimate, learning is very fast even on very large databases, its accuracy is reasonably
good in comparison to the other approaches. On the other hand, the final users do not obtain
a model that is easy to interpret and deploy, they do not understand the interest of such a
technique.

Thus, we introduce in a new presentation the results of the learning process. The
classifier is easier to understand, and its deployment is also made easier. In the first part of
this tutorial, we present some theoretical aspects of the naive bayes classifier. Then, we
implement the approach on a dataset with Tanagra. We compare the obtained results (the
parameters of the model) to those obtained with other linear approaches such as the logistic
regression, the linear discriminant analysis and the linear SVM. We note that the results
are highly consistent. This largely explains the good performance of the method in
comparison to others. In the second part, we use various tools on the same dataset.

30
8.2.2 SVM
In classification tasks a discriminant machine learning technique aims at finding, based on
an independent and identically distributed (iid) training dataset, a discriminant function
that can correctly predict labels for newly acquired instances. Unlike generative machine
learning approaches, which require computations of conditional probability distributions,
a discriminant classification function takes a data point x and assigns it to one of the
different classes that are a part of the classification task. Less powerful than generative
approaches, which are mostly used when prediction involves outlier detection, discriminant
approaches require fewer computational resources and less training data, especially for a
multidimensional feature space and when only posterior probabilities are needed. From a
geometric perspective, learning a classifier is equivalent to finding the equation for a
multidimensional surface that best separates the different classes in the feature space.

SVM is a discriminant technique, and, because it solves the convex optimization


problem analytically, it always returns the same optimal hyperplane parameter—in contrast
to genetic algorithms (GAs) or perceptron’s, both of which are widely used for
classification in machine learning. For perceptron’s, solutions are highly dependent on the
initialization and termination criteria. For a specific kernel that transforms the data from
the input space to the feature space, training returns uniquely defined SVM model
parameters for a given training set, whereas the perceptron and GA classifier models are
different each time training is initialized. The aim of GAs and perceptron’s is only to
minimize error during training, which will translate into several hyperplanes’ meeting this
requirement.

8.2.3 Logistic regression Classifiers

Logistic regression analysis studies the association between a categorical dependent


variable and a set of independent (explanatory) variables. The name logistic regression is
used when the dependent variable has only two values, such as 0 and 1 or Yes and No. The
name multinomial logistic regression is usually reserved for the case when the dependent
variable has three or more unique values, such as Married, Single, Divorced, or Widowed.
Although the type of data used for the dependent variable is different from that of multiple
regression, the practical use of the procedure is similar.

31
Logistic regression competes with discriminant analysis as a method for
analyzing categorical-response variables. Many statisticians feel that logistic regression is
more versatile and better suited for modeling most situations than is discriminant analysis.
This is because logistic regression does not assume that the independent variables are
normally distributed, as discriminant analysis does.

This program computes binary logistic regression and multinomial logistic


regression on both numeric and categorical independent variables. It reports on the
regression equation as well as the goodness of fit, odds ratios, confidence limits, likelihood,
and deviance. It performs a comprehensive residual analysis including diagnostic residual
reports and plots. It can perform an independent variable subset selection search, looking
for the best regression model with the fewest independent variables. It provides confidence
intervals on predicted values and provides ROC curves to help determine the best cutoff
point for classification. It allows you to validate your results by automatically classifying
rows that are not used during the analysis.

8.2.4 Decision tree classifiers

Decision tree classifiers are used successfully in many diverse areas. Their most important
feature is the capability of capturing descriptive decision-making knowledge from the
supplied data. Decision tree can be generated from training sets. The procedure for such
generation based on the set of objects (S), each belonging to one of the classes C1, C2, …,
Ck is as follows:

Step 1. If all the objects in S belong to the same class, for example Ci, the decision tree for
S consists of a leaf labeled with this class.
Step 2. Otherwise, let T be some test with possible outcomes O1, O2…, On. Each object
in S has one outcome for T so the test partitions S into subsets S1, S2… Sn where each
object in Si has outcome Oi for T. T becomes the root of the decision tree and for each
outcome Oi we build a subsidiary decision tree by invoking the same procedure recursively
on the set Si.

32
8.3 INPUT AND OUTPUT DESIGN

INPUT DESIGN

Input Design plays a vital role in the life cycle of software development, it requires very
careful attention from developers. The input design is to feed data to the application as
accurately as possible. So, inputs are supposed to be designed effectively so that the errors
occurring while feeding are minimized. According to Software Engineering Concepts, the
input forms or screens are designed to provide to have a validation control over the input
limit, range and other related validations.

This system has input screens in almost all the modules. Error messages are
developed to alert the user whenever he commits some mistakes and guides him in the right
way so that invalid entries are not made. Let us look deeply into this module design.

Input design is the process of converting the user created input into a computer-
based format. The goal of the input design is to make the data entry logical and free from
errors. The error is in the input are controlled by the input design. The application has been
developed in a user-friendly manner. The forms have been designed in such a way that
during the processing the cursor is placed in the position where must be entered. The user
is also provided with an option to select an appropriate input from various alternatives
related to the field in certain cases.

Validations are required for each data entered. Whenever a user enters an erroneous
data, an error message is displayed, and the user can move on to the subsequent pages after
completing all the entries in the current page.

33
OUTPUT DESIGN
The Output from the computer is required to mainly create an efficient method of
communication within the company primarily among the project leader and his team
members, in other words, the administrator and the clients.

The output of VPN is the system which allows the project leader to manage his
clients in terms of creating new clients and assigning new projects to them, maintaining a
record of the project validity and providing folder level access to each client on the user
side depending on the projects allotted to him. After completion of a project, a new project
may be assigned to the client. User authentication procedures are maintained at the initial
stages itself. A new user may be created by the administrator himself or a user can himself
register as a new user but the task of assigning projects and validating a new user rest with
the administrator only.

The application starts running when it is executed for the first time. The server
has to be started and then the internet explorer is used as the browser. The project will run
on the local area network so the server machine will serve as the administrator while the
other connected systems can act as the clients. The developed system is highly user friendly
and can be easily understood by anyone using it even for the first time.

34
CHAPTER 9

SAMPLE OUTPUT

9.1 HOME PAGE:

FIGURE 9.1: HOMEPAGE

9.2 ADMIN LOGIN:

FIGURE 9.2 ADMIN LOGIN

35
9.3 ALGORITHM ACCURACY:

FIGURE 9.3: ALGORITHM ACCURACY

9.4 TESTED ACCURACY IN BAR CHART

FIGURE 9.4: TESTED ACCURACY IN BAR CHART

36
9.5 TRAINED SET ACCURACY RESULTS

FIGURE 9.5: TRAINED SET ACCURACY RESULTS

9.6 TWEETS MESSAGE PREDICTION TYPE RATIO

FIGURE 9.6: TWEETS MESSAGE PREDICTION TYPE RATIO

37
9.7 LINE CHART OF MESSAGE PREDICTION TYPE

FIGURE 9.7: LINE CHART OF MESSAGE PREDICTION TYPE

9.8 PIE CHART OF MESSAGE PREDICTION TYPE

FIGURE 9.8. PIECHART OF MESSAGE PREDICTION TYPE

38
9.9 USER INTERFACE

FIGURE 9.9:USER INTERFACE

9.10 SAMPLE OUTPUT

FIGURE 9.10:SAMPLE OUTPUT

39
CHAPTER 10

SYSTEM TESTING

The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, subassemblies, assemblies and/or a finished product It is the
process of exercising software with the intent of ensuring that the Software system meets
its requirements and user expectations and does not fail in an unacceptable manner. There
are various types of test. Each test type addresses a specific testing requirement.

10.1 TYPES OF TESTS:


10.1.1 Unit testing:
Unit testing involves the design of test cases that validate that the internal
program logic is functioning properly, and that program inputs produce valid outputs. All
decision branches and internal code flow should be validated. It is the testing of individual
software units of the application .it is done after the completion of an individual unit before
integration. This is a structural testing, that relies on knowledge of its construction and is
invasive. Unit tests perform basic tests at component level and test a specific business
process, application, and/or system configuration. Unit tests ensure that each unique path
of a business process performs accurately to the documented specifications and contains
clearly defined inputs and expected results.

10.1.2 Integration testing:

Integration tests are designed to test integrated software components to


determine if they actually run as one program. Testing is event driven and is more
concerned with the basic outcome of screens or fields. Integration tests demonstrate that
although the components were individually satisfaction, as shown by successfully unit
testing, the combination of components is correct and consistent. Integration testing is
specifically aimed at exposing the problems that arise from the combination of
components.

40
10.1.3 Functional test:

Functional tests provide systematic demonstrations that functions tested are


available as specified by the business and technical requirements, system documentation,
and user manuals.

Functional testing is centered on the following items:

Valid Input : identified classes of valid input must be accepted.

Invalid Input : identified classes of invalid input must be rejected.

Functions : identified functions must be exercised.

Output : identified classes of application outputs must be exercised.

10.1.4 System Test:


System testing ensures that the entire integrated software system meets
requirements. It tests a configuration to ensure known and predictable results. An example
of system testing is the configuration-oriented system integration test. System testing is
based on process descriptions and flows, emphasizing pre-driven process links and
integration points.

10.1.5 White Box Testing:


White Box Testing is a testing in which in which the software tester has
knowledge of the inner workings, structure, and language of the software, or at least its
purpose. It is the purpose. It is used to test areas that cannot be reached from a black box
level.

10.1.6 Black Box Testing:


Black Box Testing is testing the software without any knowledge of the inner
workings, structure or language of the module being tested. Black box tests, as most other
kinds of tests, must be written from a definitive source document, such as specification or
requirements document, such as specification or requirements document. It is a testing in
which the software under test is treated, as a black box. You cannot “see” into it. The test
provides inputs and responds to outputs without considering how the software works.

41
10.2 User Acceptance Testing

User Acceptance of a system is the key factor for the success of any system. The
system under consideration is tested for user acceptance by constantly keeping in touch
with the prospective system users at the time of development and making changes wherever
required. The system developed provides a friendly user interface that can easily be
understood even by a person who is new to the system.

10.2.1 Output Testing

After performing the validation testing, the next step is output testing of the
proposed system, since no system could be useful if it does not produce the required output
in the specified format. Asking the users about the format required by them tests the outputs
generated or displayed by the system under consideration. Hence the output format is
considered in 2 ways – one is on screen and another in printed format.

.
10.3 TESTING STRATEGY :

A strategy for system testing integrates system test cases and design techniques
into a well-planned series of steps that results in the successful construction of software.
The testing strategy must co-operate test planning, test case design, test execution, and the
resultant data collection and evaluation .A strategy for software testing must accommodate
low-level tests that are necessary to verify that a small source code segment has been
correctly implemented as well as high level tests that validate major system functions
against user requirements.

Software testing is a critical element of software quality assurance and represents


the ultimate review of specification design and coding. Testing represents an interesting
anomaly for the software. Thus, a series of testing are performed for the proposed
system before the system is ready for user acceptance testing.

42
CHAPTER 11

CONCLUSION
Although the digital revolution and the rise of social media enabled great advances in
communication platforms and social interactions, a wider proliferation of harmful behavior
known as bullying has also emerged. This paper presents a novel framework of Bully Net
to identify bully users from the Twitter social network. We performed extensive research
on mining signed networks for better understanding of the relationships between users in
social media, to build a signed network (SN) based on bullying tendencies. We observed
that by constructing conversations based on the context as well as content, we could
effectively identify the emotions and the behavior behind bullying. In our experimental
study, the evaluation of our proposed centrality measures to detect bullies from signed
network, we achieved around 80% accuracy with 81% precision in identifying bullies for
various cases.

There are still several open questions deserving further investigation. First, our
approach focuses on extracting emotions and behavior from texts and emojis in tweets.
However, it would be interesting to investigate images and videos, given that many users
use them to bully others. Second, it does not distinguish between bullying and aggressive
users. Devising new algorithms or techniques to distinguish bullies from aggressors would
prove critical in better identification of cyber bullies. Another topic of interest would be to
study the relationship between conversation graph dynamics and geographic location and
how these dynamics are affected by the geographic dispersion of the users? Does proximity
increase bullying behavior?

43
REFERENCES
[1] J. Tang, C. Aggarwal, and H. Liu, “Recommendations in signed social. networks,” in
Proceedings of the International Conference on WWW,2016, pp. 31–40.
[2] D. Liben-Nowell and J. Kleinberg, “The link-prediction problem for social networks,”
Proceedings of the ASIS&T, vol. 58, no. 7, pp. 1019– 1031, 2007.
[3] U. Brandes and D. Wagner, “Analysis and visualization of social networks,” in Graph
drawing software, 2004, pp. 321–340.
[4] X. Hu, J. Tang, H. Gao, and H. Liu, “Social spammer detection with sentiment
information,” In Proceedings of IEEE ICDM, pp. 180––189,2014.
[5] E. E. Buckels, P. D. Trapnell, and D. L. Paulhus, Trolls just want to have fun, 2014, pp.
67:97–102.
[6] S. Kumar, F. Spezzano, and V. Subrahmanian, “Accurately detecting. trolls in Slashdot
zoo via decluttering,” in Proceedings of IEEE/ACMASONAM, 2014, pp. 188–195.
[7] J. W. Patchin and S. Hinduja, “2016 cyberbullying data,” 2017.
[8] C. R. Center, “https://cyberbullying.org/bullying-laws.”
[9] D. Cartwright and F. Harary, “Structural balance: a generalization of Heider’s theory.”
Psychological review, vol. 63, no. 5, p. 277, 1956.
[10] J. Leskovec, D. Huttenlocher, and J. Kleinberg, “Signed networks in social media,” in
Proceedings of the SIGCHI CHI, 2010, pp. 1361–1370.
[11] R. Plutchik, “A general psych evolutionary theory of emotion,” in Theories of
emotion, 1980, pp. 3–33.
[12] W. Medhat, A. Hassan, and H. Korashy, “Sentiment analysis algorithms and
applications: A survey,” Proceedings of the Ain Shams engineering journal, vol. 5, no. 4,
pp. 1093–1113, 2014.
[13] L. Tang and H. Liu, “Community detection and mining in social media, “Synthesis
lectures on data mining and knowledge discovery, vol. 2, no. 1,pp. 1–137, 2010.
[14] S. Bhagat, G. Cormode, and S. Muthukrishnan, “Node classification in social
networks,” in Social network data analytics, 2011, pp. 115–148.
[15] J. Tang, Y. Chang, C. Aggarwal, and H. Liu, “A survey of signed network mining in
social media,” In Proceedings of the ACM Comput. Surv., no. 3, pp. 42:1–42:37, 2016.
[16] J. Kunegis, J. Preusse, and F. Schwagereit, “What is the added value of negative links
in online social networks?” in Proceedings of the International Conference on WWW,
2013, pp. 727–736.
[17] Z. Wu, C. C. Aggarwal, and J. Sun, “The troll-trust model for ranking in signed
networks,” in Proceedings of the ACM International Conference on WSDM, 2016, pp.
447–456.
[18] R. Zhao, A. Zhou, and K. Mao, “Automatic detection of cyberbullying on social
networks based on bullying features,” in Proceedings of the ICDCN, 2016.
[19] V. K. Singh, Q. Huang, and P. K. Atrey, “Cyberbullying detection using. socio-textual
information fusion,” In Proceedings of the IEEE/ACM ASONAM, pp. 884—-887, 2016.
[20] H. Hosseinmardi, S. A. Mattson, R. I. Rafiq, R. Han, Q. Lv, and S. Mishra, “Detection
of cyberbullying incidents on the Instagram social network,” In Proceedings of the CoRR,
2015.
[21] J.-M. Xu, X. Zhu, and A. Bellmore, “Fast learning for sentiment analysis on bullying,”
in Proceedings of the First International WISDOM, 2012, pp. 10:1–10:6.

[22] A. C. Squicciarini, S. M. Rajtmajer, Y. Liu, and C. H. Griffin, “Identification and


characterization of cyberbullying dynamics in an online social network,” in Proceedings of
the IEEE/ACM ASONAM, 2015, pp. 280–285.

44
[23] P. Galan-Garcia, J. De La Puerta, C. Gómez, I. Santos, and P. Bringas, “Supervised
machine learning for the detection of troll profiles in twitter social network: Application to
a real case of cyberbullying,” vol. 24, pp. 42–53, 2014.
[24] D. Chatzakou, N. Kourtellis, J. Blackburn, E. De Cristofaro, G.Stringhini,and A.
Vakali, “Mean birds: Detecting aggression and bullying on twitter,” in Proceedings of the
ACM on WebSci, 2017, pp. 13–22.
[25] L. Cheng, J. Li, Y. N. Silva, D. L. Hall, and H. Liu, “Xbully: Cyberbullying detection
within a multi-modal context,” in Proceedings of the Twelfth ACM International
Conference on Web Search and Data Mining, 2019, pp. 339–347.

45

You might also like