Professional Documents
Culture Documents
Abstract—Twitter is one of the social Network which is used learning was the most suitable field for this project.
bu hundreds of millions of users.People use it to flow important
information, on the other hand it is used by individuals ,most of
them are celebrities , politicians and people with high amount C. Relevant Background
of followers. There exists so many users so it maybe confusing
sometimes to distinguish the traits of the user. To overcome this The classification of users are one of the most usable
particular point , we are building an API which will work real researches and have applied to many other platforms as well
time and would be able to predict and classify any user social as twitter ,many of the work are done on identifying whether a
traits with the help of our trained models.These traits would user is a bot or human ,or classification of user is done but in
include user’s domain/area of interests which are pre-selected
by us.This API would-be able to predict 11 types of different 2 to 3 classes only so this work is proposing 11 classes (sports
domains by providing a set of unto 50 features to various machine , Business , Politics ,Entertainment ,Science and technology
learning models mainly random forest and logistic regression. ,Religion ,General category , Health ,Education , News ,
This API is currently in a development process so we will analyze Wellbeing) so this project is extending the classification work
the possibilities of the work we have described above and will of twitter.
take further steps according to the progress of the work
I. I NTRODUCTION
II. M OTIVATION AND BACKGROUND
A. Description of the Project This project is based upon the classification of users of
As we know Twitter is one of the most commonly used twitter among 11 major classes (sports , Business , Politics
social platform and it has around 600 million users, People ,Entertainment ,Science and technology ,Religion ,General
around the globe having different traits are using the twitter, category , Health ,Education , News , Wellbeing).Since this
so this project is mainly useful in understanding the users. is a classification problem ,the field of machine learning is
Identifying the domains can be used in many forms for most suitable for this project and the models ML provides
example a particular domain of user is usually be followed for classification. There exist similar work on twitter
or following the users with the same domain of interest classification but the classification of 11 major classes wasn’t
and it produces a chain of users of similar interests, by done before. Classification: Grouping any element with
identifying these block of people we can do marketing its particular category is known as classification. Machine
activities or any other task which is needed to perform by Learning: Machine Learning is an entire field which provides
identifying domains. Ignoring the identification of blocks of algorithms and techniques to apply classification or any other
same domain of interest people, this project would also be task. Data Augmentation: Machine learning technique to
usable for identifying a single user for multiple purposes. cover the data imbalancement.
R EFERENCES
[1] https://www.researchgate.net/publication/261994329 Understanding Types of Users on Twitter
[2] https://ieeexplore.ieee.org/document/6280553
[3] https://www.researchgate.net/publication/228855257 Detecting spammers on Twitter
[4] http://pike.psu.edu/publications/asonam13.pdf
[5] Anaconda Software:-https://www.anaconda.com/what-is-anaconda/
[6] Python download:-https://www.python.org/downloads/release/python-
374/
[7] What is Feature engineering: https://medium.com/mindorks/what-is-
feature-engineering-for-machine-learning-d8ba3158d97a
[8] What is data augmentation: https://bair.berkeley.edu/blog/2019/06/07/data aug/