Professional Documents
Culture Documents
On
BACHELOR OF TECHNOLOGY
In
CERTIFICATE
This is to certify that mini project report entitled “MALICIOUS TWITTER
BOTS DETECTION USING MACHINE LEARNING.” is being submitted by
V.YOSHITHA PRIYANKA (19D21A05G6), M.SHIREESHA (19D21AO5E6),
M.POORVAJA (19D21A05E2) in partial fulfilment for the award of degree of
Bachelor of Technology in Computer Science and Engineering is a record of
bonafide work carried out by them.
EXTERNAL EXAMINAR
Certificate of Internship
Head Office:204, Ratna Complex, Image Hospital Lane, Beside DHFL, Ameerpet
Hyderabad - 500073, Contact: 7036987111/222/333/444
Branch Office:2nd Floor, Datta Lord House, Behind PVP Cinemas, M.G.Road, Labbipet
Vijayawada - 520002, Contact: 7036987666
DECLARATION
We, hereby declare that Mini project entitled “MALICIOUS TWITTER
BOTS DETECTION USING MACHINE LEARNING” is the work done during
the period of 𝟐𝟗𝐭𝐡August 2022 to 𝟏𝟓𝐭𝐡 December 2022 and is submitted in partial
fulfilment of the requirements for the award of Bachelor of Technology in Computer
Science and Engineering from Jawaharlal Nehru Technological University,
Hyderabad.
Finally, we would like to thank all our faculty, family, and friends for their
help and constructive criticism during our project period. Finally, we are very much
indebted to our parents for their moral support and encouragement to achieve goals.
M.SHIREESHA 19D21A05E6
M.POORVAJA 19D21A05E2
Page No.
1.1 Model diagram 3
5.1 System Architecture 16
5.2 Use case diagram for user 18
5.3 Sequence diagram for user 19
5.4 Class diagram for user 20
5.5 Collaboration diagram for user 21
5.6 Activity diagram for user 22
7.1 Home Screen 33
7.2 Uploading dataset screen 34
7.3 Dataset loaded 35
7.4 Displaying Tweets 36
7.5 Possible bot users 37
7.6 ROC graph 38
7.7 Malicious bot users 39
7.8 ROC graph 40
1.2 SCOPE
Twitter is one about fastest method about information transfer. It significantly
influences how individuals think. There are more people on Twitter who mask their
identities for malicious reasons. Because it poses a risk towards other users, it is
important towards recognize Twitter bots. Therefore, it is crucial that tweets are
posted through real people & not Twitter bots. A twitter bot posts spam-related
topics. Thus, identifying bots aids in identifying spam messages.
Libraries in Python
TensorFlow
TensorFlow is a free and open-source software library for dataflow and differentiable
programming across a range of tasks. It is a symbolic math library, and is also
used for machine learning applications such as neural networks. It is used for both
research and production at Google.
Numpy
Numpy is a general-purpose array-processing package. It provides a high-
performance multidimensional array object, and tools for working with these arrays.
It is the fundamental package for scientific computing with Python. It contains
various features including these important ones:
A powerful N-dimensional array object.
Sophisticated (broadcasting) functions.
Tools for integrating C/C++ and Fortran code.
Useful linear algebra, Fourier transform, and random number capabilities.
Besides its obvious scientific uses, Numpy can also be used as an efficient
multi- dimensional container of generic data.
Pandas
Pandas is an open-source Python Library providing high-performance data
manipulation and analysis tool using its powerful data structures. Python was
majorly used for data munging and preparation. Pandas solved this problem. Using
Pandas, we can accomplish five typical steps in the processing and analysis of data,
regardless of the origin of data load, prepare, manipulate, model, and analyze.
3.1.1 Disadvantages
Low security
NON-FUNCTIONAL REQUIREMENTS
• Accuracy (to detect the malicious twitter bots).
• The project should be portable, i.e., can be run on any device that has python
installed on it.
SOFTWARE REQUIREMENTS
• Operating system - Windows 10
• Programming language - PYTHON
HARDWARE REQUIREMENTS
• Processor - Intel core i3 or higher
• Speed - 1.1 GHzs
• Ram - 4 GB or higher
• Hard Disk - 500 GB
for j in range(len(data)):
bow[data[j]] += 1 #adding each word frequency to bag of words
frequency = getFrequency(bow) #getting frequency of BOTS words
if frequency > 0 and listed < 16000 and followers < 200: #if condition true
then its bots
users.append(screen)
text.insert(END,str(users)+"\n")
train_attr = dataset[
['followers_count', 'friends_count', 'listedcount', 'favourites_count',
'statuses_count', 'verified']]
train_label = dataset[['bot']]
X = train_attr
Y = train_label.as_matrix()
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2)
logreg = LogisticRegression().fit(X_train, y_train)#logistic regression object
actual = y_test
pred = logreg.predict(X_test)
accuracy = accuracy_score(actual, pred) * 100
precision = precision_score(actual, pred) * 100
recall = recall_score(actual, pred) * 100
f1 = f1_score(actual, pred)
auc = roc_auc_score(actual, pred)
text.insert(END,'\nLogistic Regression Accuracy : '+str(accuracy)+"\n")
text.insert(END,'Logistic Regression Precision : '+str(precision)+"\n")
text.insert(END,'Logistic Regression Recall is : '+str(recall)+"\n")
text.insert(END,'Logistic Regression Area Under Curve is : '+str(auc))
fpr, tpr, thresholds = metrics.roc_curve(actual, pred)
auc = metrics.auc(fpr, tpr)
plt.title('ROC')
plt.plot(fpr, tpr, 'b',
label='AUC = %0.2f'% auc)
plt.legend(loc='lower right')
Operations are
Can do further
performed and
Data not uploaded operations and
tweets are
extract tweets
extracted
In this will be selecting and uploading ‘kaggle_tweets.csv’ file and then click
on ‘Open’ button to load dataset and to get below screen.
Now click on ‘Run Module 2 (Recognize Twitter Bots using ML)’ button to
recognize BOTS user and then apply logistic regression ML to calculate BOT
prediction accuracy.