Presented By Guided By Goli Samba Kumar I.LAKSHMI MANIKYAMBA 20011F0008 Associate Professor MCA 4th SEM CSE Department
Computer Science Department May-2022
Contents • Brief Introduction • Proposed System • Tools used • Implementaion • Importing and Feature exatraction • Sentiment analysis • Modeling and evaluation • Results • Conclusion References BRIEF INTRODUCTION Twitter turned into a target platform for the spammers to disperse the huge amount of irrelevant and harmful information.
Spammers find these platforms easily accessible to trap users in
malicious activities by posting spam tweets.The main purpose of this work is to distinguish whether the tweet is ‘spam’ or ‘ham’ and evaluation of the emotion of the tweet. PROPOSED SYSTEM The analysis also shows that several machine learning-based techniques can be effective for identifying spam on Twitter. classification phase of model will decide whether the testing tweets are spam or ham tweets. It Easily predicts to determine the user sending the spam or quality content in the twitter Tools used… i. Data manipulation packages (Pandas and Numpy..etc) ii. Noise removal libraries (Regular expressions) iii. Preprocess textual data NLTK (Puntuations removal, Stemming and lemmatization) iv. Sentiment Analysis and Classification using Textblob classifier Implementation •Fetching Dataset from Kaggle. •Import libraries •Preprocessing •Feature Extraction •Feature Matching •Model Construction •Results evaluation Importing & Feature extraction . Detection of spam words Word Frequencies¶ Sentiment analysis Determine whether the writer's attitude towards a particular topic, product, etc. is positive, negative, or neutral by using the textblob tool. Modelling and evaluation Naïve bayes : It is an efficient classifier that is used to classify the text message as spam message or ham message. The Naive Bayesian classifier is based on probability theory. This model is used because it gives good performance and requires less computational time for training the model.t the probability of one attribute does not affect the probability of the other. It is based on the bayes theorem. Naïve bayes (continued…) Decision tree The decision tree is used to create a classification model based on training data, that model can be used to predict the class label of test data sample. The algorithm uses tree representation structure to solve classification problem. Decision tree(continued…) Knn algorithm To classify the test tweet, KNN algorithm identifies, k closest samples that are similar to test sample. The k nearest neighbours are identified by similarities of data sample. The data sample similarities are computed with some set of similarity measures. Knn algorithm(continued….) Random forest Random Forest is a very flexible machine learning classifier that consists of a collection of tree structured classifiers. It randomly selects the features to construct a collection of decision trees. As we realise that a forest is composed of trees, more trees mean stronger forests Random forest(continued….) Results… Naïve Random knn Decision bayes forest tree metrics
Random forest gives the better results and better performance
conclusion •To save the reputation of the twitter and awareness of the users to save from spam tweets •This research deals with the study of spam classification techniques in twitter. •The results display the highest accuracy 82.0 percent provided by Random Forest Classifier which gives the effective model for detecting the spam. References 1 C. Grier, K. Thomas, V. Paxson, and M. Zhang, “@spam: The underground on 140 characters or less,” in proc. ACM conf. Computer communication security, 2010, pp. 27-37. 2 Y Boshmaf, I.Muslukhov, K Beznoson, and M. Ripeanu, “ design and analysis of social botnet,” computer networks, vol. 57, no. 2, pp. 556- 578, 2013. 3.Ashwini Bhangare1, Smita Ghodke2, Kamini Walunj 3, Utkarsha Yewale4 “ twitter spammer detection,” International Research Journal of Engineering .
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB