You are on page 1of 6

2021 International Conference on

Artificial Intelligence (ICAI)


Islamabad, Pakistan, April 05-07, 2021

Identifying Depression Among Twitter Users using


Sentiment Analysis
Fiza Azam Maha Agro Memoona Sami
Department of Software Engineering Department of Software Engineering Department of Software Engineering
Mehran University of Engineering and Mehran University of Engineering and Mehran University of Engineering and
Technology Technology Technology
Jamshoro, Pakistan Jamshoro, Pakistan Jamshoro, Pakistan
fiza_et@outlook. com mahaagro48@gmail. com memoona. sami@faculty .muet. edu.pk
2021 International Conference on Artificial Intelligence (ICAI) | 978-1-6654-3293-1/20/$31.00 ©2021 IEEE | DOI: 10.1109/ICAI52203.2021.9445271

Muhammad Hassan Abro Amirita Dewani


Department of Software Engineering Department of Software Engineering
Mehran University of Engineering and Technology Mehran University of Engineering and Technology
Jamshoro, Pakistan Jamshoro, Pakistan
hafeezhassan! 96@gmail.com amirita. dewani@faculty .muet. edu.pk

Abstract—Major depressive disorder is one of the most is less prevalent in seniors, only 1 to 5 percent community
crippling diseases, it accounts for 4.3% of the global disease living seniors experience depression [2]. Gender plays a role
burden. With depression being so prevalent, it is of paramount in depression as well, women have a higher risk of developing
importance that a systematic method of diagnosis should exist. depression as compared to men, about 1 in every 8 women
However, current diagnostic methods such as questionnaires may expected to develop Major Depressive Disorder (MDD)
and clinical diagnosis, consist of self-reported symptoms. once in her lifetime [1].
Therefore, both methods are susceptible to patient
manipulation. Nowadays social media websites such as The previously mentioned statistics show that no matter
Facebook, Twitter, Reddit, and Tumblr provide a method to the age, economic or social status, depression can affect
obtain behavioral attributes to thoughts and interactions of a anyone at any time of life, and it can have devasting
person. This research is focused on developing a machine consequences, suicide being the worst-case scenario.
learning model capable of analyzing linguistic patterns obtained Therefore, it is necessary to accurately identify the signs and
from Twitter user data and determining whether a particular symptoms of depression as quickly as possible. Early
user exhibits depressive symptoms. We trained Support Vector diagnosis can significantly improve Quality of Life as well
Machine and Random Forest models for this purpose and, reduce the likelihood of substance abuse and chances of
compare their efficiency of diagnosis and concluded that suicide. However, traditional methods are often slow and
Random Forest gave the best results. We believe the results
plagued with misdiagnosis. These pitfalls present the need to
obtained from this research can be considered for the
search for a more accurate and timely method of diagnosis.
development of the new technique for effective identification of
depressed users on social media platforms. Considering the above facts, social media presents itself as
an optimal tool for the identification of depression as it
Keywords— Depression Analysis, Social Media, Twitter, significantly reduces the social bias related with depression
Natural Language Processing, Sentiment Analysis, Bag-of-words, and provides people with related issues a means to form
Support Vector Machine, Random Forest, Natural Language communities and support groups. Women especially have
Toolkit, Machine Learning
been found to be more expressive on social media as
I. I n t r o d u c t io n
compared to men. People go as far as to provide deeply
personal details. Moreover, social media is cheap, requiring
Although depression is often regarded as a serious illness, no additional costs. Therefore, social media platforms such as
it is estimated to be the leading cause of disability worldwide. Twitter and Facebook can be an excellent means to identify
According to WHO, approximately 264 million people suffer depression through natural language processing techniques.
from depression globally [1]. This number is especially
detrimental for developing countries, as 50.8 million people II. L i t er at ur e Re v i e w
may be living under the shadow of depression. As of 2013,
The study conducted by M. Nadeem et al. [3] gathered
depression cost $71 billion dollars as treatment costs in the
dataset from public Tweets from Twitter that matched the
United States, making it the 6th most costly health condition
query string “ I was diagnosed with X disease” , X here could
and the number one most costly psychiatric condition to treat.
be either depression or PTSD. They mined behavioral and
Globally, an estimated $1 trillion are lost as productivity costs
linguistic features from this data set and applied ML
due to depression and a total of $210.5 billion are lost as
algorithms, then assessed the result of each model. A. Leis et
economic cost of depression in the US [1]. Undiagnosed
al. mined dataset consisting of 140,946 tweets from 90 people,
depression can lead to reduced quality of life, loneliness, and
1000 tweets of which constituted of users whose language
alienation from loved ones. According to the Anxiety and
indicated depression. They identified various key features,
Depression Association of America, 6 to 8 percent teens may
such as: depressed users used first person pronouns often and
be experiencing depression, the Centers for Disease Control
their use of negative language was higher compared to non­
and Prevention (CDC) estimates that 17% of people aged 13
depressed people [4]. G. Coppersmith et al. gathered Tweets
to 14 have seriously considered suicide as of 2017, depression
through the Twitter public API based on query “ I was

978-1-6654-3293-1/21/$31.00 ©2021 IEEE


44
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on July 04,2021 at 12:23:02 UTC from IEEE Xplore. Restrictions apply.
diagnosed with X”, where “X” was replaced with four 1) Depressed Users Tweets Collection: For this study,
conditions: depression, post-traumatic stress disorder (PTSD), instead of relying on mental health surveys, we followed the
bipolar disorder, and seasonal affective disorder (SAD). They same approach as Coppersmith et al. [5] to automatically
analyzed each user’s language using three methods that were identify depressed candidates on the Twitter platform who
LIWC Analysis, Language Model Analysis (1-gram
publicly mentioned their depression diagnosis. Social media
Language Model and character 5-gram Language Model), and
users opt for online platforms to discuss societal stigma
Pattern of Life Analytics [5]. M. Park et al. used “Depression”
as a keyword to gather Tweets via Twitter API of 14,817 related to mental diseases. Many users publicly disclose their
users. They applied Multiple Regression Model and LIWC diagnosis using tweet statements such as “I was diagnosed
analysis and found that users posted private information about with X.” where X can be any mental disorder [9]. In the case
their depression, they also found that depressed users posted of our study, we chose X as depression, and ran two queries:
egocentric tweets more frequently to gather support. • “I was diagnosed with depression” and
However, contrary to numerous popular studies, they did not • “I am diagnosed with depression
identify gender as a factor [6]. K.Cornn [7] attempted to gather the relevant data and retrieved 4000 tweets in total
categorize comments found on reddit as “depressed” or “non- in which each query contributed 2000 tweets. We manually
depressed” by using NLP and ML techniques, they made use checked all the retrieved data and filtered the genuine tweets
of Logistic Regression, SVM, BERT based model as well as of diagnosis. Table. 1 shows the separation of genuine tweets
CNN. A. E. Aladag et al. used Reddit platform to distinguish from disingenuous ones.
between suicidal and non-suicidal posts with the help of text
mining techniques, they employed commonly used TABLE .I Ge n u in e a n d Dis in g e n u o u s St a t e m e n t s
classification techniques such as Logistic Regression, o f Tw e e t s
Random Forest, and SVM. The results were that Logistic Category Tweets
Regression performed better than other techniques [8]. 1. I can feel you dude I am 17 and I have been diagnosed with
depression the past years were the worst in all my life and
III. M e t h o d o l o g y still I am fighting to get over it. Being down
Genuine psychologically is not an easy thing hope you are good
Fig. 1, depicts the proposed methodology for this study. In now.
the first step, we identified depressed and non-depressed users 2. I just went to the doctor and I am now officially diagnosed
on the Twitter platform, then extracted their tweets. with depression.
Afterward, we preprocessed the data. In the next step, feature 1. I am a gun owner. But I also have never been suicidal or
extraction was performed, and lastly, we fed the data to two diagnosed with depression. One of my guns is my
grandfather's military service weapon from WWII, and my
ML algorithms to get the results. Disingenuous other one is my carry weapon. I've also been trained in its
use and safety, too.
A. Data Collection 2. I am annoyed with those people telling that they have
Data gathering was performed in two phases first both depression without being diagnosed.
depressed and control candidates were identified on the
Twitter platform then the tweets of their past two weeks were Manual check resulted in the collection of 400 genuine
extracted. tweets, and the data was further checked against following
two conditions:

Fig. 1. Proposed methodology for identification o f depressed users on Twitter.

45
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on July 04,2021 at 12:23:02 UTC from IEEE Xplore. Restrictions apply.
• Each of the 400 candidates must have posted at least In the next phase, we extracted the public tweets from the
100 tweets. period starting from the day they posted their birthday tweet
• Along with this, it was also checked whether most up to two weeks later, concatenated, and saved them in the
of their tweets were in the English language or not. database in such a manner that each user had just one record.
In other words, all the users were treated as a document and
after the application of conditions, we were left with 378 were labeled as non-depressed. The total of 7699 tweets were
depressed candidates. Center for Epidemiological Studies stored in control dataset.
Depression scale (CES-D) [10], Personal Health
Questionnaire (PHQ-9) [11] and Beck’s Depression scale 3) Anonymization: The usernames in both the depressed
(BDI) [12] are mental assessment tests that ask patients to fill and control datasets were anonymized and were replaced with
the form based on their behavior in past one or two weeks. numeric ids, whereas other information like URLs and
By keeping this in mind we extracted the tweets of each 378 geolocations information was also removed from the tweets.
candidates from the period starting from the day they posted
the diagnosis tweet up to two weeks later. All the users were 4) Database Table Structure: The data was stored in an
treated as a single document where their retrieved tweets SQL database, two tables were created namely control_data
were concatenated and stored as a single record in the and depress_data. Both tables had the same columns which
database. The total of 82077 tweets were stored in depressed were
dataset. • Id: which was used to represent a particular user.
• Depressed: This column contained numeric value 1
2) Control Users Tweets Collection: We ran a random
in the case of depressed users and 0 in the case of
query “Today is my birthday” and retrieved 3200 tweets.
non-depressed users.
These 3200 candidates were considered for our control
• Tweets: This column contained all the tweets of a
dataset. Further, we followed the same approach as [4] and
applied the filter of 14 words that constitute depression and particular user which were concatenated in a single
its derivatives. Table. II shows all the words we considered column.
in the filter and also shows the number of candidates before • Count: The column contained the numeric value
and after the removal of each word. The filter of words was which was used to tell the number of tweets
applied to the recent 3200 tweets of each candidate, if any of retrieved from each user
them were found to use these words in their tweets, their
5) Caveats: Our approach for finding users has the
record was removed from the control set. At this stage we
following caveats
were left with 495 users, next, we selected unique usernames
to avoid redundant data which resulted in 441 control users. • We have only considered the users who have posted
Afterward, we checked control data against two similar most of their tweets in English in both of our
conditions which were applied on depressed data beforehand. datasets, therefore the data retrieved cannot be
We examined whether each of them have posted at least 100 considered as a representative sample of the entire
tweets and are most of their tweets in the English language. users of Twitter.
By doing this we were finally left with 308 control • The method utilized in this study to identify
candidates. depressed users cannot verify whether the self-
report of their diagnosis is true or not. However, it
TABLE. II De pr e s s io n a n d it s De r iv a t iv e s seems unlikely the users will post the false diagnosis
of the conditions they do not have.
S.no W ord C andidates Before C andidates A fter
• The control group might have been contaminated by
1 Depression 3200 1920 some users who do suffer from a mental disorder
2 Anxiety 1920 1538 and have not self-reported their diagnosis on
3 Distressed 1538 1525 Twitter.
4 Demotivated 1525 1523 • The users on Twitter cannot be considered as a
5 Insomnia 1523 1464
representative sample of the whole population of the
world.
6 Lonely 1464 1263
7 Empty 1263 1073 B. Data Cleaning
8 Exhausted 1073 1027 We applied the following filters for preprocessing our
9 Worried 1027 924 datasets.
10 Overwhelmed 924 906 • Conversion of emoji and emoticons to words using
11 Tired 906 746 Demoji and Emoticon API.
12 Sad 746 561 • Removal of URLs using regular expressions.
13 Discouraged 561 558 • Removal of non-English words and symbols using
Python’s Natural Language Toolkit (NLTK).
14 Cry 558 501
• Stop words removal using NLTK except for first-
15 Nervous 501 495
person pronouns because according to a study

46
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on July 04,2021 at 12:23:02 UTC from IEEE Xplore. Restrictions apply.
conducted by [4] they were found to be among the 1) Random Forest: A random forest classifier [17] is
high word occurrence in the depressed dataset; essentially an ensemble of decision trees. Each decision tree
therefore, they were left so that they can serve as a spits out a “decision” i.e a label that predicts the class of the
feature. data provided. The class that appears most often is chosen as
• Tokenization and lemmatization of tweets were label of the data. As we increase the number of trees, so does
performed at the end with the help of NLTK. the accuracy of prediction increase. Random forest classifier
is commonly used for regression, classification and other
C. Feature Extraction tasks. Fig. 4 shows the basic working of a decision tree
While it is convenient for us to exchange information in algorithm.
natural language, computers cannot understand that. Rather,
they work best with numbers, specifically vectors of numbers.
The process of converting text into vectors is known as feature
extraction. The bag of words model is commonly used for ••• ••
feature extraction purposes. It is based on two things: A. / \ AL /A A. A *' A, s\ A
• Lexicon of known words, and Tree 1 Tree 2 Tree n

| |
• Frequency of the known words 1 i
Class A Class B Class A
however, discards any information related to order or structure
[13], as shown in the Fig. 2. By inserting these vectors in our
predictive models, we will be able to estimate the likelihood Majority-Voting
of depression in a person.

Final Class

Fig. 4. Working of Random Forest algorithm. A class which is


predicted the most is selected.
Fig. 2. Working o f Bag o f Words technique

D. Algorithms Selection E. Algorithm Implementation


Through our research of various similar academic papers, For the algorithm implementation we used the support
we learned that Support Vector Machine (SVM) has been used vector machine and random forest classifiers available
frequently for sentiment analysis with favorable results [3] [7] through Python’s scikit-learn library. After data cleaning, we
[8] [14] [15]. We also found that Random forest is a relatively applied the bag of words approach on our dataset. We only
simple classifier that is frequently used for sentiment analysis used the words as a feature in our dataset. As discussed before
[15]. tweets of each user were merged into a single record, if the
user belonged to the depressed class, the depressed column
1) Support Vector Machine: SVM [16] is a binary was labeled 1, else it was labeled 0. We passed the bag of
classification algorithm typically used when data to be words vector and the labeled data to our classifier.
analyzed is limited. SVM analyses a set of data points, finds
For both classifier train data split ratio was set to 4: 5 i.e.
a hyperplane which is essentially a line, that can best separate 80% train set and 20% test set which means the train set was
the data based on their type or class. SVM generates various composed of 547 records whereas there were a total of 138
such hyperplanes, but the one that can separate the data points records in a test set. We then applied the classifiers on our
optimally is chosen. Based on the position of the hyperplane, dataset and saved the predicted classes along with the actual
the data is separated into separate classes. classes in a CSV file to form the classification report.

IV. C l a s s if ic a t io n Ma t r ic e s

A classification report provides certain matrices to assess


the performance of the model. These matrices are calculated
using True Positives and Negatives, False Positive and
Negative. True and False are generic names for depressed and
non-depressed classes.
• A True Positive indicates that the model
predicted depressed class as depressed.
• A False Positive indicates that the model
predicted non-depressed class as depressed.
• A True Negative indicates that the model
predicted non-depressed class as non-depressed.
Fig. 3. Working o f Support Vector Machine. If filled and unfiilled • A False Negative indicates that the model
dots represent two classes then hyper plane “H” optimally classifies predicted depressed class as non-depressed.
both o f the classes.

47
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on July 04,2021 at 12:23:02 UTC from IEEE Xplore. Restrictions apply.
A. Precision We obtained the results with the Random Forest classifier
Precision describes the accuracy of depressed predictions which are shown in Table. IV. The depressed class have
i.e., how many of the documents identified depressed were greater precision and F1-score values whereas the non-
actually depressed. It is calculated using (1) depressed class has achieved a greater recall score. The model
as a whole has achieved an accuracy of 0.77.

(1) TABLE. IV Re s u l t s o f Ra n d o m Fo r e s t
+ False Positive
B. Recall Precision Recall F1-score Support

Recall describes whether the classifier was able to identify Non-


all the depressed occurrences. It is calculated with the formula 0.71 0.81 0.76 62
depressed
shown in (2).
Depressed 0.82 0.74 0.78 76

Accuracy 0.77 138

+ False Negative
(2)

We see that while the SVM has produced better values for
C. F1-Score precision with depressed class, and recall for non-depressed
class, the Random Forest classifier has a better F1 score,
F 1-score was used to contrast classifiers. It is the weighted
which means there is a better balance between precision and
average of precision and recall. When seeking a balance
between precision and recall, F1 score is utilized. It is recall for both depressed and non-depressed classes.
calculated using (3). Moreover, the accuracy of the Random Forest is also slightly
greater than the SVM, so it can be said that the Random
Forest performed best.
Precision*Recall
2 * (3) VI. Co n c l u s io n
+ Recal
We conclude that machine learning algorithms can indeed be
used to diagnose depression among social media users.
D. Accuracy However, the models need to be refined further to make them
It is the total number of instances predicted correctly, i.e. as accurate as possible, currently the accuracy achieved by
of all the documents identified as depressed and non- both ML models is close to 80 percent, if we add more
depressed, how many of them were correctly predicted. It is features, they can surely become 90% accurate.
calculated by (4).
VII. Fu t u r e Wo r k

These models can be used to assist medical professionals in


_________ +__________________ the diagnosis of depression. More features such as age,
+True Negative + False Positive+False Negative (4) gender, number of followers, and the day of the week when
most of the tweets are posted can be used to refine the model
so that it can identity depressed users on Twitter as correctly
E. Support as possible.
It shows number of instances found for a class.
VIII. R e f e r en c es

[1] World Health Organization, “Depression,” World Health


Organization, Jan. 30, 2020. https://www.who.int/news-room/fact-
V. R e s u l t s
sheets/detail/depression [Accessed Feb. 17, 2020].
We obtained the following results with the SVM classifier, [2] “Depression is not a normal part of growing older,” Centers fo r Disease
as specified by the classification report shown in Table. III. Control and Prevention, Jan. 31, 2017. Accessed: Feb. 20, 2020.
The precision value of the depressed class is higher in contrast [Online]. Available: https://www.cdc.gov
to the non-depressed class whereas in the case of recall and f1- [3] M. Nadeem, M. Horn, G. Coppersmith, and Dr. S. Sen, “Identifying
depression on Twitter,” ArXiv, vol. abs/1607.07384, 2016, Accessed:
score non-depressed class has achieved maximum value.
Mar. 02, 2020. [Online]. Available: https://www.semanticscholar.org
However, the model overall has achieved an accuracy value
[4] A. Leis, F. Ronzano, M. A. Mayer, L. I. Furlong, and F. Sanz,
of 0.73. “Detecting signs of depression in tweets in Spanish: behavioral and
TABLE. III Re s u l t s o f Sv m linguistic analysis,” Journal o f Medical Internet Research, vol. 21, no.
6, Jun. 2019, doi: 10.2196/14199.
Precision Recall F1-score Support [5] G. Coppersmith, M. Dredze, and C. Harman, “Quantifying mental
health signals in Twitter,” Proceedings o f the Workshop on
Computational Linguistics and Clinical Psychology: From Linguistic
Non- 0.64 0.94 0.76 62 Signal to Clinical Reality, pp. 51-60, Jun. 2014, doi: 10.3115/v1/w14-
depressed
3207.
Depressed 0.91 0.57 0.70 76 [6] M. Park, C. Cha, and M. Cha, “Depressive moods o f users portrayed in
Twitter,” N YU Scholars, pp. 1-8, Aug. 2012.
Accuracy 0.73 138 [7] K. Cornn, “Identifying depression on social media.” Accessed: Mar.
20, 2020. [Online]. Available: https://www.stanford.edu

48
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on July 04,2021 at 12:23:02 UTC from IEEE Xplore. Restrictions apply.
[8] A. E. Aladag, S. Muderrisoglu, N. B. Akbas, O. Zahmacioglu, and H.
O. Bingol, “Detecting suicidal ideation on forums: Proof-of-Concept
study,” Journal o f Medical Internet Research, vol. 20, no. 6, p. e215,
Jun. 2018, doi: 10.2196/jmir.9840.
[9] G. Coppersmith, M. Dredze, C. Harman, K. Hollingshead, and M.
Mitchell, “CLPsych 2015 shared task: depression and PTSD on
Twitter,” Proceedings o f the 2nd Workshop on Computational
Linguistics and Clinical Psychology: From Linguistic Signal to
Clinical Reality, 2015, doi: 10.3115/v1/w15-1204.
[10] L. Radloff, “The CED-D scale: a self-report depression scale for
research in the general population.,” Appl. Psychol. Meas., vol. 1, no.
3, pp. 385-401, 1977.
[11] A. T. Beck, C. H. Ward, M. Mendelson, J. Mock, and J. Erbaugh, “An
inventory for measuring depression,” Arch. Gen. Psychiatry, vol. 4, no.
6, pp. 561-571, 1961.
[12] K. Kroenke, R. L. Spitzer, and J. B. W. Williams, “The PHQ-9: validity
of a brief depression severity measure,” Journal o f General Internal
Medicine, vol. 16, no. 9, pp. 606-613, Sep. 2001, doi: 10.1046/j. 1525-
1497.2001.016009606.x.
[13] J. Brownlee, “How to develop a word-level neural language model and
use it to generate text,” Machine Learning Mastery, Nov. 09, 2017.
Accessed: Apr. 07, 2020. [Online]. Available:
https://machinelearningmastery.com
[14] V. M. Prieto, S. Matos, M. Alvarez, F. Cacheda, and J. L. Oliveira,
“Twitter: a good place to detect health conditions,” PLoS ONE, vol. 9,
no. 1, p. e86191, Jan. 2014, doi: 10.1371/journal.pone.0086191.
[15] Md. R. Islam, M. A. Kabir, A. Ahmed, A. R. M. Kamal, H. Wang, and
A. Ulhaq, “Depression detection from social network data using
machine learning techniques,” Health Information Science and
Systems, vol. 6, no. 1, Aug. 2018, doi: 10.1007/s13755-018-0046-0.
[16] “ 1.4. Support Vector Machines — scikit-learn 0.20.3 documentation,”
Scikit-learn.org, 2018. Accessed: May. 02, 2020. [Online]. Available:
https://scikit-learn.org
[17] “3.2.4.3.1. sklearn.ensemble.RandomForestClassifier — scikit-learn
0.20.3 documentation,” Scikit-learn.org, 2018. Accessed: May. 03,
2020. [Online]. Available: https://scikit-learn.org

49
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on July 04,2021 at 12:23:02 UTC from IEEE Xplore. Restrictions apply.

You might also like