Professional Documents
Culture Documents
Abstract—Major depressive disorder is one of the most is less prevalent in seniors, only 1 to 5 percent community
crippling diseases, it accounts for 4.3% of the global disease living seniors experience depression [2]. Gender plays a role
burden. With depression being so prevalent, it is of paramount in depression as well, women have a higher risk of developing
importance that a systematic method of diagnosis should exist. depression as compared to men, about 1 in every 8 women
However, current diagnostic methods such as questionnaires may expected to develop Major Depressive Disorder (MDD)
and clinical diagnosis, consist of self-reported symptoms. once in her lifetime [1].
Therefore, both methods are susceptible to patient
manipulation. Nowadays social media websites such as The previously mentioned statistics show that no matter
Facebook, Twitter, Reddit, and Tumblr provide a method to the age, economic or social status, depression can affect
obtain behavioral attributes to thoughts and interactions of a anyone at any time of life, and it can have devasting
person. This research is focused on developing a machine consequences, suicide being the worst-case scenario.
learning model capable of analyzing linguistic patterns obtained Therefore, it is necessary to accurately identify the signs and
from Twitter user data and determining whether a particular symptoms of depression as quickly as possible. Early
user exhibits depressive symptoms. We trained Support Vector diagnosis can significantly improve Quality of Life as well
Machine and Random Forest models for this purpose and, reduce the likelihood of substance abuse and chances of
compare their efficiency of diagnosis and concluded that suicide. However, traditional methods are often slow and
Random Forest gave the best results. We believe the results
plagued with misdiagnosis. These pitfalls present the need to
obtained from this research can be considered for the
search for a more accurate and timely method of diagnosis.
development of the new technique for effective identification of
depressed users on social media platforms. Considering the above facts, social media presents itself as
an optimal tool for the identification of depression as it
Keywords— Depression Analysis, Social Media, Twitter, significantly reduces the social bias related with depression
Natural Language Processing, Sentiment Analysis, Bag-of-words, and provides people with related issues a means to form
Support Vector Machine, Random Forest, Natural Language communities and support groups. Women especially have
Toolkit, Machine Learning
been found to be more expressive on social media as
I. I n t r o d u c t io n
compared to men. People go as far as to provide deeply
personal details. Moreover, social media is cheap, requiring
Although depression is often regarded as a serious illness, no additional costs. Therefore, social media platforms such as
it is estimated to be the leading cause of disability worldwide. Twitter and Facebook can be an excellent means to identify
According to WHO, approximately 264 million people suffer depression through natural language processing techniques.
from depression globally [1]. This number is especially
detrimental for developing countries, as 50.8 million people II. L i t er at ur e Re v i e w
may be living under the shadow of depression. As of 2013,
The study conducted by M. Nadeem et al. [3] gathered
depression cost $71 billion dollars as treatment costs in the
dataset from public Tweets from Twitter that matched the
United States, making it the 6th most costly health condition
query string “ I was diagnosed with X disease” , X here could
and the number one most costly psychiatric condition to treat.
be either depression or PTSD. They mined behavioral and
Globally, an estimated $1 trillion are lost as productivity costs
linguistic features from this data set and applied ML
due to depression and a total of $210.5 billion are lost as
algorithms, then assessed the result of each model. A. Leis et
economic cost of depression in the US [1]. Undiagnosed
al. mined dataset consisting of 140,946 tweets from 90 people,
depression can lead to reduced quality of life, loneliness, and
1000 tweets of which constituted of users whose language
alienation from loved ones. According to the Anxiety and
indicated depression. They identified various key features,
Depression Association of America, 6 to 8 percent teens may
such as: depressed users used first person pronouns often and
be experiencing depression, the Centers for Disease Control
their use of negative language was higher compared to non
and Prevention (CDC) estimates that 17% of people aged 13
depressed people [4]. G. Coppersmith et al. gathered Tweets
to 14 have seriously considered suicide as of 2017, depression
through the Twitter public API based on query “ I was
45
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on July 04,2021 at 12:23:02 UTC from IEEE Xplore. Restrictions apply.
• Each of the 400 candidates must have posted at least In the next phase, we extracted the public tweets from the
100 tweets. period starting from the day they posted their birthday tweet
• Along with this, it was also checked whether most up to two weeks later, concatenated, and saved them in the
of their tweets were in the English language or not. database in such a manner that each user had just one record.
In other words, all the users were treated as a document and
after the application of conditions, we were left with 378 were labeled as non-depressed. The total of 7699 tweets were
depressed candidates. Center for Epidemiological Studies stored in control dataset.
Depression scale (CES-D) [10], Personal Health
Questionnaire (PHQ-9) [11] and Beck’s Depression scale 3) Anonymization: The usernames in both the depressed
(BDI) [12] are mental assessment tests that ask patients to fill and control datasets were anonymized and were replaced with
the form based on their behavior in past one or two weeks. numeric ids, whereas other information like URLs and
By keeping this in mind we extracted the tweets of each 378 geolocations information was also removed from the tweets.
candidates from the period starting from the day they posted
the diagnosis tweet up to two weeks later. All the users were 4) Database Table Structure: The data was stored in an
treated as a single document where their retrieved tweets SQL database, two tables were created namely control_data
were concatenated and stored as a single record in the and depress_data. Both tables had the same columns which
database. The total of 82077 tweets were stored in depressed were
dataset. • Id: which was used to represent a particular user.
• Depressed: This column contained numeric value 1
2) Control Users Tweets Collection: We ran a random
in the case of depressed users and 0 in the case of
query “Today is my birthday” and retrieved 3200 tweets.
non-depressed users.
These 3200 candidates were considered for our control
• Tweets: This column contained all the tweets of a
dataset. Further, we followed the same approach as [4] and
applied the filter of 14 words that constitute depression and particular user which were concatenated in a single
its derivatives. Table. II shows all the words we considered column.
in the filter and also shows the number of candidates before • Count: The column contained the numeric value
and after the removal of each word. The filter of words was which was used to tell the number of tweets
applied to the recent 3200 tweets of each candidate, if any of retrieved from each user
them were found to use these words in their tweets, their
5) Caveats: Our approach for finding users has the
record was removed from the control set. At this stage we
following caveats
were left with 495 users, next, we selected unique usernames
to avoid redundant data which resulted in 441 control users. • We have only considered the users who have posted
Afterward, we checked control data against two similar most of their tweets in English in both of our
conditions which were applied on depressed data beforehand. datasets, therefore the data retrieved cannot be
We examined whether each of them have posted at least 100 considered as a representative sample of the entire
tweets and are most of their tweets in the English language. users of Twitter.
By doing this we were finally left with 308 control • The method utilized in this study to identify
candidates. depressed users cannot verify whether the self-
report of their diagnosis is true or not. However, it
TABLE. II De pr e s s io n a n d it s De r iv a t iv e s seems unlikely the users will post the false diagnosis
of the conditions they do not have.
S.no W ord C andidates Before C andidates A fter
• The control group might have been contaminated by
1 Depression 3200 1920 some users who do suffer from a mental disorder
2 Anxiety 1920 1538 and have not self-reported their diagnosis on
3 Distressed 1538 1525 Twitter.
4 Demotivated 1525 1523 • The users on Twitter cannot be considered as a
5 Insomnia 1523 1464
representative sample of the whole population of the
world.
6 Lonely 1464 1263
7 Empty 1263 1073 B. Data Cleaning
8 Exhausted 1073 1027 We applied the following filters for preprocessing our
9 Worried 1027 924 datasets.
10 Overwhelmed 924 906 • Conversion of emoji and emoticons to words using
11 Tired 906 746 Demoji and Emoticon API.
12 Sad 746 561 • Removal of URLs using regular expressions.
13 Discouraged 561 558 • Removal of non-English words and symbols using
Python’s Natural Language Toolkit (NLTK).
14 Cry 558 501
• Stop words removal using NLTK except for first-
15 Nervous 501 495
person pronouns because according to a study
46
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on July 04,2021 at 12:23:02 UTC from IEEE Xplore. Restrictions apply.
conducted by [4] they were found to be among the 1) Random Forest: A random forest classifier [17] is
high word occurrence in the depressed dataset; essentially an ensemble of decision trees. Each decision tree
therefore, they were left so that they can serve as a spits out a “decision” i.e a label that predicts the class of the
feature. data provided. The class that appears most often is chosen as
• Tokenization and lemmatization of tweets were label of the data. As we increase the number of trees, so does
performed at the end with the help of NLTK. the accuracy of prediction increase. Random forest classifier
is commonly used for regression, classification and other
C. Feature Extraction tasks. Fig. 4 shows the basic working of a decision tree
While it is convenient for us to exchange information in algorithm.
natural language, computers cannot understand that. Rather,
they work best with numbers, specifically vectors of numbers.
The process of converting text into vectors is known as feature
extraction. The bag of words model is commonly used for ••• ••
feature extraction purposes. It is based on two things: A. / \ AL /A A. A *' A, s\ A
• Lexicon of known words, and Tree 1 Tree 2 Tree n
| |
• Frequency of the known words 1 i
Class A Class B Class A
however, discards any information related to order or structure
[13], as shown in the Fig. 2. By inserting these vectors in our
predictive models, we will be able to estimate the likelihood Majority-Voting
of depression in a person.
Final Class
IV. C l a s s if ic a t io n Ma t r ic e s
47
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on July 04,2021 at 12:23:02 UTC from IEEE Xplore. Restrictions apply.
A. Precision We obtained the results with the Random Forest classifier
Precision describes the accuracy of depressed predictions which are shown in Table. IV. The depressed class have
i.e., how many of the documents identified depressed were greater precision and F1-score values whereas the non-
actually depressed. It is calculated using (1) depressed class has achieved a greater recall score. The model
as a whole has achieved an accuracy of 0.77.
(1) TABLE. IV Re s u l t s o f Ra n d o m Fo r e s t
+ False Positive
B. Recall Precision Recall F1-score Support
+ False Negative
(2)
We see that while the SVM has produced better values for
C. F1-Score precision with depressed class, and recall for non-depressed
class, the Random Forest classifier has a better F1 score,
F 1-score was used to contrast classifiers. It is the weighted
which means there is a better balance between precision and
average of precision and recall. When seeking a balance
between precision and recall, F1 score is utilized. It is recall for both depressed and non-depressed classes.
calculated using (3). Moreover, the accuracy of the Random Forest is also slightly
greater than the SVM, so it can be said that the Random
Forest performed best.
Precision*Recall
2 * (3) VI. Co n c l u s io n
+ Recal
We conclude that machine learning algorithms can indeed be
used to diagnose depression among social media users.
D. Accuracy However, the models need to be refined further to make them
It is the total number of instances predicted correctly, i.e. as accurate as possible, currently the accuracy achieved by
of all the documents identified as depressed and non- both ML models is close to 80 percent, if we add more
depressed, how many of them were correctly predicted. It is features, they can surely become 90% accurate.
calculated by (4).
VII. Fu t u r e Wo r k
48
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on July 04,2021 at 12:23:02 UTC from IEEE Xplore. Restrictions apply.
[8] A. E. Aladag, S. Muderrisoglu, N. B. Akbas, O. Zahmacioglu, and H.
O. Bingol, “Detecting suicidal ideation on forums: Proof-of-Concept
study,” Journal o f Medical Internet Research, vol. 20, no. 6, p. e215,
Jun. 2018, doi: 10.2196/jmir.9840.
[9] G. Coppersmith, M. Dredze, C. Harman, K. Hollingshead, and M.
Mitchell, “CLPsych 2015 shared task: depression and PTSD on
Twitter,” Proceedings o f the 2nd Workshop on Computational
Linguistics and Clinical Psychology: From Linguistic Signal to
Clinical Reality, 2015, doi: 10.3115/v1/w15-1204.
[10] L. Radloff, “The CED-D scale: a self-report depression scale for
research in the general population.,” Appl. Psychol. Meas., vol. 1, no.
3, pp. 385-401, 1977.
[11] A. T. Beck, C. H. Ward, M. Mendelson, J. Mock, and J. Erbaugh, “An
inventory for measuring depression,” Arch. Gen. Psychiatry, vol. 4, no.
6, pp. 561-571, 1961.
[12] K. Kroenke, R. L. Spitzer, and J. B. W. Williams, “The PHQ-9: validity
of a brief depression severity measure,” Journal o f General Internal
Medicine, vol. 16, no. 9, pp. 606-613, Sep. 2001, doi: 10.1046/j. 1525-
1497.2001.016009606.x.
[13] J. Brownlee, “How to develop a word-level neural language model and
use it to generate text,” Machine Learning Mastery, Nov. 09, 2017.
Accessed: Apr. 07, 2020. [Online]. Available:
https://machinelearningmastery.com
[14] V. M. Prieto, S. Matos, M. Alvarez, F. Cacheda, and J. L. Oliveira,
“Twitter: a good place to detect health conditions,” PLoS ONE, vol. 9,
no. 1, p. e86191, Jan. 2014, doi: 10.1371/journal.pone.0086191.
[15] Md. R. Islam, M. A. Kabir, A. Ahmed, A. R. M. Kamal, H. Wang, and
A. Ulhaq, “Depression detection from social network data using
machine learning techniques,” Health Information Science and
Systems, vol. 6, no. 1, Aug. 2018, doi: 10.1007/s13755-018-0046-0.
[16] “ 1.4. Support Vector Machines — scikit-learn 0.20.3 documentation,”
Scikit-learn.org, 2018. Accessed: May. 02, 2020. [Online]. Available:
https://scikit-learn.org
[17] “3.2.4.3.1. sklearn.ensemble.RandomForestClassifier — scikit-learn
0.20.3 documentation,” Scikit-learn.org, 2018. Accessed: May. 03,
2020. [Online]. Available: https://scikit-learn.org
49
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on July 04,2021 at 12:23:02 UTC from IEEE Xplore. Restrictions apply.