Data Sources

Paper Authors
CLPsych 2015 Shared Task: Depression Coppersmith, Dredze, Harman,

and PTSD on Twitter Hollingshead, Mitchell
Quantifying Mental Health Signals in

Twitter Coppersmith, Dredze, Harman
From ADHD to SAD: Analyzing the

Language of Mental Health on Twitter Coppersmith, Dredze, Harman,
through Self-Reported Diagnoses Hollingshead
Towards Augmenting Crisis Counselor

Training by Improving Message Retrieval DeMasi, Hearst, Recht
Characterizing and Predicting Postpartum
Depression from Shared Facebook Data De Choudhury, Counts, Horvitz, Hoff
Content Analysis of Depression-Related Cavazos-Reh, Krauss, Sowles,

Tweets Connolly, Rosasa, Bharadwaj, Bierut
Cross-cultural differences in language Loveys, Torrez, Fine, Moriarty,

markers of depression online Coppersmith
Depressive Moods of Users Portrayed in

Twitter Park, Cha, Cha
De Choudhury, Gamon, Counts,

Predicting Depression via Social Media Horvitz
Towards Assessing Changes in Degree of Schwartz, Eichstaedt, Kern, Park, Sap,

Depression through Facebook Stillwell, Kosinski, Ungar
Recognizing Depression From Twitter Tsugawa, Kikuchi, Kishino, Nakajimi,
Activity Itoh, Ohsaki
Forecasting the Onset and Course of Reece, Reagan, Lix, Dodds, Danforth,
Mental Illness with Twitter Data Langer
Measuring post traumatic stress disorder

in Twitter Coppersmith, Harman, Dredze
The role of personality, age, and gender in Preotiuc-Pietro, Eichstaedt, Park, Sap,
tweeting about mental illnesses Smith, Toblosky, Schwartz, Ungar
Exploratory Analysis of Social Media Prior
to a Suicide Attempt Coppersmith, Ngo, Leary, Wood
Identifying Depression on Twitter Nadeem, Horn, Coppersmith, Sen
The language of mental health problems Gkotis, Oellrich, Hubbard, Dobson,

in social media Liakata, Velupillai, Dutta
Discovering shifts to suicidal ideation

from mental health content in social De Choudhury, Kiciman, Dredze,
media Coppersmith, Kumar
Social Media Based Index of Mental Well-

Being in College Campuses Bagroy, Kumaraguru, De Choudhury
Quantifying and Predicting Mental Illness

Severity in Online Pro-Eating Disorder Chancellor, Lin, Goodman, Zerwas, De
Communities Choudhury
O'Dea, Wan, Batterham, Calear, Paris,

Detecting Suicidality on Twitter Christensen
Activities on Facebook Reveal the
Depressive State of Users Park, Lee, Kwak, Cha, Jeong
Detecting Linguistic Traces of Depression

Topic-Restricted Text: Attending to Self-
Stigmatized Depression with NLP Wolohan, Hirgaga, Mukerjee, Sayyed
Multi-Task Learning for Mental Health

using Social Media Text Benton, Mitchell, Hovy
Text-based Detection and Understanding

of Changes in Mental Health Li, Mihalcea, Wilson
Mixed-Initiative Real-Time Topic
Modeling & Visualization for Crisis Dinakar, Chen, Lieverman, Picard, Fill-
Counseling in
Quantifying Mental Health from Social Amir, Coppersmith, Carvalho, Silva,

Media with Neural User Embeddings Wallace
Can Text Messages Identify Suicide Risk in

Real Time? A within-subjects pilot
examination of temporally sensitive
markers of suicide risk Glenn, Nobles, Barners, Teachman
Protecting User Privacy and Rights in

Academic Data-Sharing Partnerships: Pisani, Kanuri, Filbin, Gallo, Gould,
Principles from a pilot program at Crisis Lehmann, Levine, Marcotte, Pascal,
Text Line Rousseau, Turner, Yen, Ranney
CLPsych 2016 Shared Task: Triaging
Content in Online Peer Support Forums Milne, Pink, Hachey, Calvo
Detecting Comments Showing Risk for
Suicide in YouTube Gao, Cheng, Yu
Detecting Low Self-Esteem in Youths from

Web Search Data Zaman, Acharyya, Kautz, Silenzio
Expert, Crowdsourced, and Machine
Assessment of Suicide Risk via Online Shing, Nair, Zirikly, Friedenberg,
Postings Daumé III, Resnik
Identification of Imminent Suicide Risk Nobles, Glenn, Kowsari, Teachman,

Among Young Adults using Text Messages Barnes
Depression and Self-Harm Risk

Assessment in Online Forums Yates, Cohan, Goharian
Learning from various labeling strategies

for suicide-related messages on social
media: An experimental study Liu, Chen, Homan, Silenzio
CLPsych 2019 Shared Task: Predicting the

Degree of Suicide Risk in Reddit Posts Zirikly, Resnik, Uzuner, Hollingshead
Towards Automatically Classifying
Depressive Symptoms from Twitter Data
for Population Health Mowery, Park, Conway, Bryan
Can acute suicidality be predicted by

Instagram data? Results from qualitative Brown, Bendig, Fischer, Goldwich,
and quantitative language analyses Baumeister, Plener
Understanding and Fighting Bullying with
Machine Learning Junming Sui
SMHD: A Large-Scale Resource for

Exploring Online Language Usage for Cohan, Desmet, Yates, Soldaini,
Multiple Mental Health Conditions MacAvaney, Goharian
RSDD-Time: Temporal Annotation of Self- MacAvaney, Desmet, Cohan, Soldaini,
Reported Mental Health Diagnoses Yates, Zirikly, Goharian
Towards Developing an Annotation

Scheme for Depressive Disorder
Symptoms: A Preliminary Study using
Twitter Data Mowery, Bryan, Conway
Detecting Changes in Suicide Content

Manifested in Social Media Following Kumar, Dredze, Coppersmith, De
Celebrity Suicides Choudury
Using Linguistic Features to Estimate

Suicide Probability of Chinese Microblog
Users Zhang, Huang, Liu, Chen, Zhu
Identifying Chinese Microblog Users with

High Suicide Probability Using Internet-
Based Profile and Linguistic Features:
Classification Model Guan, Hao, Cheng, Yip, Zhu
Detecting Suicidal Ideation in Chinese
Microblogs with Psychological Lexicons Huang, Zhang, Liu, Chiu, Li, Zhu
The Language of Social Support in Social

Media and its Effect on Suicidal Ideation
Risk De Choudury, Kiciman
Mental Health Surveillance over Social

Media with Digital Cohorts Amir, Dredze, Ayers
Feature Attention Network: Interpretable

Depression Detection from Social Media Song, You, Chunk, Park
Natural Language Processing for Mental

Health: Large Scale Discourse Analysis of
Counseling Conversations Althoff, Clark, Leskovec
The Role of Features and Context on
Suicide Ideation Detection Wang, Wan, Paris
User Dynamics in Mental Health Forums

-- A Sentiment Analysis Perspective Davcheva, Adam, Benlian
Adapting Deep Learning Methods for

Mental Health Prediction on Social Media Sekulic, Strube
Multi-Task, Multi-Channel, Multi-Input

Learning for Mental Illness Detection
using Social Media Text Kirinde Gamaarachichige, Inkpen
Dreaddit: A Reddit Dataset for Stress
Analysis in Social Media Turcan, McKeown
Dilated LSTM with attention for

Classification of Suicide Notes Schoene, Lacy, Turner, Dethlefs
Latent Suicide Risk Detection on
Microblog via Suicide-Oriented Word
Embeddings and Layered Attention Cao, Zhang, Feng, Wei, Wang, Li, He
Gender and Cross-Cultural Differences in De Choudury, Sharma, Logar,

Social Media Disclosures of Mental Illness Eekhout, Cluasen Nielsen
Tracking Suicide Risk Factors Through Jashinky, Burton, Hanson, West,

Twitter in the US Giraud-Carrier, Barnes, Argyle
Deep Learning for Depression Detection Husseini Orabi, Buddhitha, Husseini

of Twitter Users Orabi, Inkpen
Attention-based LSTM for Psychological

Stress Detection from Spoken Language
Using Distant Supervision Winata, Pepijin Kampman, Fung
Suicidal Trend Analysis of Twitter using Shahreen, Subhani, Mahfuzur

Machine Learning and Neural Network Rahman
Monitoring Tweets for Depression to

Detect At-risk Users Jamil
Eichstaedt, Smith, Merchant, Ungar,

Facebook language predicts depression in Crutchley, Pretoiuc-Pietro, Asch,
medical records Schwartz
A multilevel predictive model for
detecting social network users with
depression Wongkoblap, Vadillo, Curcin
Instagram photos reveal predictive

markers of depression Reece, Danforth
Exploring the utility of community-

generated social media content for
detecting depression: an analytical study
on Instagram Ricard, Marsch, Crosier, Hassanpour
Predicting Multiple Risky Behaviors via

Multimedia Content Zhou, Zhang, Luo
Triaging content severity in online mental
health forums Cohan, Young, Yates, Goharian
Helping or hurting? predicting changes in

users’ risk of self-harm through online Soldaini, Walsh, Cohan, Han,
community interactions Goharian
Detecting suicidal ideation on forums: Aladağ, Murderrisoglu, Akbas,

proof-of-concept study Zahmacioglu, Bingol
Norms matter: contrasting social support

around behavior change in online weight
loss communities Chancellor, Hu, De Choudhury
Measuring the impact of anxiety on
online social interactions Dutta, Ma, De Choudhury
Characterization of mental health

conditions in social media using Informed Gkotsis, Oellrich, Velupillai, Liakata,
Deep Learning Hubbard, Dobson, Dutta
Within and between-person differences

in language used across anxiety support
and neutral reddit communities Ireland, Iserman
Hierarchical neural model with attention

mechanisms for the classification of social
media text related to mental health Ive, Gkotis, Dutta, Stewart, Velupillai
Identifying depression on reddit: The
effect of training data Pirina, I. & Çöltekin, Ç
Measuring the latency of depression

detection in social media. Sadeque, Xu, Bethard
Modeling Stress with Social Media

Around Incidents of Gun Violence on
College Campuses Saha, De Choudhury
Detecting anxiety on Reddit Hanwen Shen, Rudzicz

Assessing suicide risk and emotional
distress in Chinese social media: a text
mining and machine learning study Cheng, Li, Kwok, Zhu, Yip
Topic Model for Identifying Suicidal

Ideation in Chinese Microblog Huang, Li, Zhang, Liu, Chiu, Zhu
User-level psychological stress detection

from social media using deep neural
network Lin, Jia, Guo, Xue, Li, Huang, Cai, Feng
Psychological stress detection from cross-

media microblog data using deep sparse
neural network Lin, Jia, Guo, Xue, Li, Huang, Cai, Feng
A depression detection model based on

sentiment analysis in micro-blog social
network Wang, Zhang, Ji, Sun, Wu, Bao
Cross-domain depression detection via Shen, Jia, Shen Feng, He, Luan, Tang,
harvesting social media Tiropanis, Chua, Hall
Teenagers’ stress detection based on

time-sensitive microblog
comment/response actions Zhao, Jia, Feng
Recovery Amid Pro-Anorexia: Analysis of

Recovery in Social Media Chancellor, Mitra, De Choudhury
Anorexia on Tumblr: A Characterization

Study on Anorexia De Choudhury
Detecting cognitive distortions through Simms, Ramstedt, Rich, Richards,

machine learning text analytics Martinez, Giraud-Carrier
A collaborative approach to identifying
social media markers of schizophrenia by
employing machine learning and clinical Birnbaum, Kiranmai Ernala, Rizvi, De
appraisals Choudhury, Kane
Validating machine learning algorithms

for twitter data against established Braithwaite, Giraud-Carrier, West,
measures of suicidality Barnes, Lee Hanson
Machine Classification and analysis of

suicide-related communication on Twitter Burnap, Colombo, Scourfield
Predicting postpartum changes in
emotion and behavior via social media De Choudhury, Counts, Horvitz
Social Media As a Measurement Tool of

Depression in Populations De Choudhury, Counts, Horvitz
Toward Macro-Insights for Suicide

Prevention: Analyzing Fine-Grained
Distress at Scale Homan, Johar, Liu, Lytle, Silenzio, Alm
Small but Mighty: Affective Micropatterns
for Quantifying Mental Health from Social Loveys, Crutchley, Wyatt,
Media Language Coppersmith
Mining Twitter data to improve detection McManus, Mallory, Goldfelder,

of schizophrenia Haynes, Tatum
Quantifying the language of schizophrenia

in social media Mitchell, Hollingshead, Coppersmith
Twitter: a good place to detect health Prieto, Matos, Alvarez, Cacheda,

conditions Oliveira
Beyond LDA: exploring supervised topic

modeling for depression-related language Resnik, Armstrong, Claudino, Nguyen,
in Twitter Nguyen, Boyd-Graber
Inferring Mood Instability on Social Media
by Leveraging Ecological Momentary Saha, Chan, Barbaro, Abowd, De
Assessments Choudhury
MIDAS: Mental illness detection and Saravia, Chang, Jollet De Lorenzo,

analysis via social media Chen
Predicting depression from language-

based emotion dynamics: longitudinal
analysis of Facebook and twitter status
updates Seabrook, Kern, Fulcher, Rickard
Depression detection via harvesting social

media: A multimodal dictionary learning
solution Shen, Jia, Feng, Zhang, Hu, Chua, Zhu
On estimating depressive tendency of Tsugawa, Mogi, Kikuchi, Kishino,

twitter users from their tweet data Fujita, Itoh, Ohsaki
Emotional and Linguistic Cues of

Depression from Social Media Vedula, Parthasarathy
Detecting and Characterizing Eating-
Disorder Communities on Social Media Wang, Brede, Ianni, Mentzakis
Defining patients with depressive

disorder by using textual information Nakamura, Kubo, Usuda, Aramaki
Affective and content analysis of online

depression communities Nguyen, Phung, Dao, Venkatesh, Berk
Understanding and Discovering

Deliberate Self-harm Content in Social Wang, Tang, Li, Li, Wan, Mellina,
Media O'Hare, Chang
Exploiting Temporal Information in a Two-
Stage Classification Framework for
Content-Based Depression Shen, Kuo, Chen, Lin
Suicide Ideation of Individuals in Online

Social Networks Masuda, Kurahashi, Onari
Methodological Gaps in Predicting Mental

Health States from Social Media: Ernala, Birnbaum, Candan, Rizvi,
Triangulating Diagnostic Signals Sterling, Kane, De Choudury
What does social media say about your

stress? Lin, Jia, Nie, Shen, Chua
An improved model for depression

detecting in micro-blog social network Wang, Zhang, Sun
Matero, Idnani, Son, Giorgi, Vu,
Suicide Risk Assessment with Multi-level Zamani, Limbachiya, Guntuku,
Dual-Context Language and BERT Schwartz
Accommodating Grief on Twitter: An

Analysis of Expressions of Grief Among
Gang Involved Youth on Twitter Using
Qualitative Analysis and Natural Language Upton Patton, MacBeth,
Processing Schoenebeck, Shear, McKeown
Automatic detection of eating disorder-

related social media posts that could Yan, Fitzsimmons-Craft, Goodman,
benefit from a mental health intervention Krauss, Das, Cavazos-Rehg
Van Hee, Jacobs, Emmery, Desmet,

Automatic detection of cyberbullying in Lefever, Verhoeven, De Pauw,
social media text Daelemans, Hoste
Benchmarking Aggression Identification in

Social Media Kumar, Ojha, Malmasi, Zampieri
Detection of Depression-related Posts in
Reddit Social Media Forum Tadesse, Lin, Xu, Yang
Detection of Suicide-Related Posts in

Twitter Data Streams Vioulès, Moulahi, Azé, Bringay
BioInfo@UAVR at eRisk 2019: delving into

social media texts for the early detection
of mental and food disorders Trifan, Luís Oliveira
#MyDepressionLooksLike: Examining
Public Discourse About Depression on Lachmar, Wittenborn, Bogen,
Twitter McCauley
A Framework for Early Detection of

Antisocial Behavior on Twitter Using Singh, Du, Zhang, Wang, Miao,
Natural Language Processing Sianaki, Ulhaq
Natural Language Processing of Social

Media as Screening for Suicide Risk Coppersmith
Not Just Depressed: Bipolar Disorder

Prediction on Reddit Sekulic ́, Gjurković, Šnajder
Perception Differences between the

Depressed and Non-Depressed Users in
Twitter Park, McDonald, Cha
Predictive linguistic features of Sarioglu Kayi, Diab, Pauselli,
schizophrenia Compton, Coppersmith
Quick and (maybe not so) Easy Detection

of Anorexia in Social Media Posts Mohammadi, Amini, Kosseim
Semi-Supervised Approach to Monitoring Hossein Yazdavar, Al-Olimat,

Clinical Depressive Symptoms in Social Ebrahimi, Bajaj, Banerjee,
Media Thirunarayan, Pathak, Sheth
Using Topic Modeling to Detect and

Describe Self-Injurious and Related
Content on a Large- Scale Digital Platform Franz, Nook, Mair, Nock
Understanding Depressive Symptoms and

Psychosocial Stressors on Twitter: A Mowery, Smith, Cheney, Stoddard,
Corpus-Based Study Coppersmith, Bryan, Conway
"Let Me Tell You About Your Mental

Health!" Contextualized Classification
of Reddit Posts to DSM-5 for Web-based Gaur, Kursuncu, Alambo, Sheth,
Intervention Daniulaityle, Thirunaryan, Pathak
Identifying Depressive Users in Twitter

Using Multimodal Analysis Kang, Yoon, Yi Kim
Inferring Social Media Users' Mental
Health Status from Multimodal
Information Xu, Pérez-Rosas, Mihalcea
eRISK 2017: CLEF Lab on Early Risk

Prediction on the Internet: Experimental
Foundation Losada, Crestani, Parapar
Overview of eRisk: Early Risk Prediction

on the Internet Losada, Crestani, Parapar
Year Platform
2015 Twitter
2014 Twitter
2015 Twitter
2019 Synthetic Crisis Text Conversations

2014 Facebook
2017 Twitter
7 Cups of Tea (Chat-based peer

2018 support platform)
2012 Twitter
2013 Twitter
2014 Facebook
2015 Twitter
2016 Twitter
2014 Twitter
2015 Twitter
2016 Twitter
2017 Twitter
2016 Reddit
2016 Reddit
2017 Reddit
2016 Instagram
2015 Twitter
2013 Facebook
2018 Reddit
2017 Twitter
2018 Reddit
2015 Crisis Text Line
2017 Twitter
2018 SMS
2016 ReachOut (Online Forum)
2018 YouTube
2019 Google Search

2018 Reddit
SMS, emails, and call history, social

media data (i.e., Twitter and
2018 Facebook), web browsing history
2017 Reddit
2017 Twitter
2019 Reddit
2016 Twitter
2019 Instagram
2015 Twitter
2018 Reddit
2018 Reddit
2015 Twitter
2015 Reddit, Wikipedia
2014 Sina Weibo
2015 Sina Weibo

2014 Sina Weibo
2017 Reddit
2019 Twitter
2018 Reddit
2019 Twitter
2019 3 Online mental-health forums
2019 Reddit
2019 Twitter
2019 Reddit
Death Row Last Statements, The

2019 Kernel, Tumblr
2019 Sina Weibo
2017 Twitter
2014 Twitter
2018 Twitter
2018 Twitter, Interview
2018 Twitter
2017 Twitter
2018 Facebook
2018 Facebook
2017 Instagram
2018 Instagram
2017 Instagram
2018 Reddit
2018 Reddit
2018 Twitter
2017 Reddit
2018 Reddit
2018 Reddit
2018 Reddit, Online Support Forums
2018 Reddit
2017 Reddit
2017 Reddit
2017 Sina Weibo
2015 Sina Weibo
2014 Sina Weibo, Tencent Weibo, Twitter
2014 Sina Weibo
2013 Sina Weibo

2018 Twitter, Weibo
2015 Tencent Weibo
2016 Tumblr
2015 Tumblr
2017 Tumblr
2017 Twitter
2016 Twitter
2015 Twitter
2013 Twitter
2013 Twitter
2014 Twitter
2017 Twitter
2015 Twitter
2015 Twitter
2014 Twitter
2015 Twitter, Essays

Twitter, Facebook, Ecological
2017 Momentary Assessments
2016 Twitter
2018 Facebook, Twitter
2017 Twitter
2013 Twitter
2017 Twitter
2017 Twitter
TOBYO Toshoshitsu (Disease Survivor

2014 Blogs)
2014 LiveJournal
2017 Flickr
2013 PTT (Bulletin Board System)
2013 Mixi
2019 Twitter, Facebook
2016 Sina Weibo
2013 Sina Weibo

2019 Reddit
2018 Twitter
2019 Reddit
2018 AskFM
2018 Facebook, Twitter

2019 Reddit, Online Support Forums
2017 Twitter
2019 Reddit
2017 Twitter
2019 Twitter
2018 Twitter
2018 Reddit
2013 Twitter
2018 Twitter, Essays
2019 Reddit
2017 Twitter
2019 TeenHelp.org (Forum)
2017 Twitter
2018 Reddit
2019 Twitter
2020 Flickr
2017 Reddit
2018 Reddit
Target Outcomes Labeling Methodology
Regular-expressions (e.g. "I was just

diagnosed with X"); age and gender
matched controls; manual annotation
Depression, PTSD, Control of correctness

Bipolar Disorder, Depression, PTSD, matched controls; manual annotation
Seasonal Affective Disorder (SAD) of correctness
ADHD, Anxiety, Bipolar Disorder,

Borderline Personality Disorder, Regular-expressions (e.g. "I was just
Depression, Eating, OCD, PTSD, diagnosed with X"); age and gender
Schizophrenia, Seasonal Affective matched controls; manual annotation
Disorder of correctness
None (Message Retrieval Task) Trained counselor role-play

PHQ-9 survey with confirmed clinical
Post Partum Depression diagnoses
Feelings of Depression, Support for

Depression, School or Work-related
Pressures related to Depression,
Substance use to deal with Random sample of Twitter using
depression, self-hard or suicidal keyword matching; manual
thoughts annotation of categories
Survey, Users of the platform

None volunteer their cultural heritage
Depression, Control CES-D Survey
Depression CES-D Survey + BDI
Continuous Depression Score Survey (Big 5 Personality)
Depression CES-D Survey
Depression, PTSD CES-D Survey
PTSD Regular-expressions

Depression, PTSD of correctness
Regular expressions with manual
Suicide Attempt annotation/verification
Anxiety, Borderline Personality,

Bipolar, Opiate Addiction, Self Hard,
Addiction, Asperger's, Autism,
Alcoholism, Opiate Usage,
Schizophrenia, Self-hard, Suicidal
Ideation Subreddit participation
Depression, Mental Health (General),

Trauma, Bipolar, Borderline
Personality, PTSD, Psychosis, Eating
Disorders, Self Harm, Rape Survivors,
Panic, Social Anxiety, Suicidal Ideation Subreddit participation
14 mental-health related subreddits + Subreddit participation (inductive

small set of control subreddits (e.g. transfer learning applied to users
r/AskReddit) outside the original subreddits)
Topic Modeling + Clinical Annotation

of the Topics; posts sourced using
Eating Disorder keyword seeds
Keyword matching + manual

Suicidal Ideation annotation (3-levels of concern)
Depression CES-D Survey
Subreddit participation (starting a

Depression thread in r/depression)
Neuroatypicality, Suicide Attempt,

Anxiety, Depression, Eating Disorder,
Panic Attacks, Schizophrenia, Bipolar Regular-expressions; age- & gender-
Disorder, PTSD matched controls
Change in Mental Health Disorder Change in participation between

Communication various mental health subreddits
None Platform participation

Periods of Suicide Attempts, Suicidal

Ideation, Depressive Episodes, Self-identification during in-person
Positive Mood interview
None None
Moderator Annotated (4-levels of
Self-harm risk)
Keyword Video Search, Manual
Suicidal Ideation Comment Annotation
Self-esteem Survey + Rosenberg Self-Esteem Scale

Random sampling and then manual
annotation by expert + non-expert
Suicidal Ideation annotators; 4 levels of risk
Suicidal Ideation Survey + interview
High precision keyword search;

Depression manual annotation
Keyword-based sample;
Suicidal Risk crowdsourced and expert annotations
Regular expression, subreddit

participation, and manual annotation
Suicidal Ideation (4-levels of risk)
Regular expression matching; manual
Depressive Symptoms and Stressors annotation according to DSM-5 and
Associated with Depression DSM-IV
Non-suicidal Self-Injury Pictures of self-harm
Cyberbullying Manual Annotation
ADHD, Anxiety, Autism, Bipolar

Disorder, Depression, Eating Disorder,
Obsessive Compulsive Disorder,
PTSD, Schizophrenia High-precision regular expressions
Random sample with manual
Depression Diagnosis Date annotation
Manual Annotation, hierarchical

schema. Data queried from API using
Major Depressive Disorder keywords.
Suicidal Ideation Participation in r/SuicideWatch
Suicidal Ideation Survey (Suicide Probability Scale)
Suicidal Ideation Survey (Suicide Probability Scale)

Suicidal Ideation Manual Annotation
Subreddit Participation; classes

distinguished based on time period of
posting and movement between
r/SuicideWatch and other mental-
Suicidal Ideation health subreddits


Survey (interactive text-messaging

Counseling Outcome sessions) regarding success of session
Keyword matching + manual
Suicidal Ideation annotation (3-levels of concern)
Forum participation (Depression,

Bipolar Disorder, Anxiety, Panic
Attacks, ADHD, Borderline Personality
Sentiment Disorder, OCD, PTSD)


Manual annotation via Mechanical
Stress Turk (Stress, Not Stress, Can't Tell)
Last Row Death Row (Texas

Department of Criminal Justices);
Suicide, Imminent Death, Depression, Known suicide and depression notes
Loneliness on Tumblr
Commenting in "Tree Hole" (a page
on Sina Weibo from an individual who
Suicidal Ideation died by suicide)
Suicidal Ideation Regular expressions
Key phrases/Keywords based on

suicide-related terms. Additional set
of filter criteria to remove
sarcastic/irrelevant material. Two
raters verified random sample of
1,000 tweets (agreement 79.6% of
the time). Of the 1,000 tweets, 789
Suicidal Ideation were found to be relevant.
CLPsych: Regular-expressions (e.g. "I

was just diagnosed with X"); age and
gender matched controls; manual
annotation of correctness
Bell Lets Talk: Hashtag (#BellLetsTalk)

Depression and manual annotation
Interview Corpus: manual annotation

from 3 human judges.
Twitter Corpus: Distant supervision
based on hashtag usage at the end of
Stress a Tweet
Key phrases/Keywords - Unclear how

Suicidal Ideation the negative class is determined
Bell Lets Talk: Hashtag (#BellLetsTalk)

and manual annotation; additional
filtering done to remove promotional
Depression material
International Classification of Disease

Depression (ICD) codes in medical records
Life Satisfaction, Depression Satisfaction with Life Scale, CES-D
Depression CES-D
Depression PHQ-8 survey
Depression, Drug Use, Alcohol Use,

Sleep Disorder, Eating Disorder Hashtags, Sentiment
Manual annotation via forum
Self-harm/Suicidal Ideation moderators
Manual annotation via forum

moderators, used to train a model
Change in Distress (Self-harm/Suicidal that labels severity into Green (safe)
Ideation) and Flagged (Top 3 levels of crisis)
Manual annotation of posts with

r/SuicideWatch, r/Depression,
Suicidal Ideation r/Anxiety, and r/ShowerThoughts
Binary classification task of whether a

Weight loss support vs. Eating- comment was posted In r/Loseit or
disorder encouragement r/proED
Regular expressions to identify
Twitter users with self-disclosed
anxiety; manual annotation of validity
for the users. Tweets annotated as
being anxiety-related based on
keywords (LIWC + embedding
Change in social interaction based on similarity expansion) and model
inferred anxiety trained on r/Anxiety vs. Control data
Classification on a post-level of each

mental health condition. Assignments
based on subreddit. Authors manually
Borderline Personality Disorder, labeled 160 posts from the given 16
Bipolar Disorder, Schizophrenia, mental-health subreddits to verify
Anxiety, Depression, Self-harm, that they consistently matched the
Suicidality, Addiction, Alcoholism, nominal purpose of the given
Opiates, Autism, and Control subreddit
Identified users who posted in anxiety

related subreddits and then selected
control users based on the non-
mental-health-related subreddits that
Anxiety the anxiety-users tended to post in
Classification on a post-level of each

mental health condition. Assignments
based on subreddit. Authors manually
Borderline Personality Disorder, labeled 160 posts from the given 16
Bipolar Disorder, Schizophrenia, mental-health subreddits to verify
Anxiety, Depression, Self-harm, that they consistently matched the
Suicidality, Addiction, Alcoholism, nominal purpose of the given
Opiates, Autism, and Control subreddit
External data set of posts made in an
online depression Forum (Ramirez-
Esparza et al. 2008) and online breast
cancer support forum (Gorbunova
2007); Reddit data set curated using
regular expressions on a post-level
(e.g. I was diagnosed with) in the
r/Depression subreddit; Control users
sampled from breast cancer, family,
Depression, Breast Cancer Support, and relationship support subreddits;
Familiar Support, Relationship No use of manual verification (expect
Support false positives)
Depression users identified from

r/depression based on regular
expressions (disclosures) and then
manually authenticated by authors
(require physician diagnosis). Control
group randomly sampled from non-
depression subreddits and users who
posted in r/depression but didn't
Depression report having depression
Collect posts from r/stress

community (1402) to use as "high
stress" class. 100,000 random posts
sampled by crawling landing page of
Stress Reddit to create "low-stress" class.
Collect posts from anxiety-related

subreddits (r/anxiety, r/panicparty,
r/healthanxiety, r/socialanxiety).
Sampled posts from other first-
person point-of-view subreddits (see
Anxiety paper) to serve as the control group
Web-based survey asked respondents
to fill out Chinese version of Suicide
Probability Scale (SPS), Chinese
version of the DASS-21 was used to
measure the respondents' emotional
distress, and Weibo Suicide
Communication (single question
asking whether a respondent posted
anything in the last 12 months
Suicidal Ideation, Depression, indicating they wanted to kill
Anxiety, and Stress themselves)
Human annotators determined level

of suicidal ideation in each microblog.
3 Levels (suicide warning sign but no
plan, suicide plan but no attempt,
plan and attempt). All levels
transformed into a single binary task:
Suicidal Ideation suicidal ideation or not
Regular expressions find "I feel

stressed" vs. "I feel relaxed" on a
Stress weekly level per individual
Hashtag-based labeling approach,

hashtags fall into 5-categories
(affection, work, social, physiological,
and others). Also crawl tweets with
no stressed hashtags. Randomly
sampled 500 tweets for labeling by 3
people to validate accuracy of distant
Stress supervision (95% accuracy)
Questionnaire-based and interviews

Depression by psychologists
Regular expressions, manual
verification. Control group sampled
randomly. Consider sample of post
history 4-weeks around post used for
Depression labeling
Authors interviewed high school

students and had them manually scan
their own tweets to annotate them
Stress on four stress categories
Snowball sampling to identify eating

disorder-related posts (304 tags,
55,334 posts, 18,923 users); collect
historical data for 13,317 active users;
identified subset of tags related to
recovery and relapse, requiring 5
distinct posts with associated tags to
be assigned to recovery or relapse
group; manual verification (kohen's
kappa .83) for random sample of 150
recovery users by researchers and
Anorexia (Recovery) clinical psychologist
Snowball sampling to identify eating

disorder-related posts (304 tags,
55,334 posts, 18,923 users); separate
pro-recovery and pro-ana
communities based on tag subset
(identified using co-occurrence
methods); sampled 32,000 control
posts using 10 most frequent tags
(e.g. GIF, art, food) meant to
Anorexia (Recovery), Anorexia represent general Tumblr use
Query posts containing "personal",

"lonely", "pathetic", and "sad" tags
(493 total, 459 actually readable);
each post annotated by 4 of the
authors as exhibiting distorted or
Cognitive Distortion undistorted thought patterns
Regular expressions to identify
Twitter users with self-disclosed
schizophrenia (21,254 posts by
15,504 users); randomly sampled 671
users for manual clinical appraisal;
control group was random sample
w/o mentions of schizophrenia or
psychosis; psychiatrist and graduate-
level mental health clinician used
disclosure tweet and +- 10
surrounding tweets to verify
Schizophrenia authenticity
Recruit participants via Mturk;

Participants take DSI-SS (Depressive
Symptom Inventory-Suicide
Subscale). Participants with score > 2
Suicidality labeled as suicidal.
Use 4 suicide related websites

(experienceproject, enotalone,
takethislife, recoveryourlife) and
Tumblr as data for identifying suicide-
related lexicon (via TF-IDF); track 62
resulting keywords (n-grams 1-5) on
Twitter for 6 weeks starting in Feb.
2014; sample 800 tweets containing
keywords and an additional 200
containing names of publicized
suicides; crowdsource 4 annotations
per tweet amongst 7 categories to
distinguish suicide relevance;
removed tweets with < 75%
Suicidal Ideation agreement
Use newspaper birth announcements
to construct lexicon of n-grams likely
to indicate birth; name-based sex
inference identifies females;
Mturkers shown +- 5 posts around
possible announcement to verify
new-mother status (5 ratings per
candidate mother); multiple binary
classification tasks based on extreme
Behavioral Change for New Mothers change for 33 behavioral measures
(re: Postpartum Depression (threshold selected per measure)
CES-D Survey on M-Turk + self-

reported experience with depression;
user-groups based on extreme values
(low + high) of CES-D; user-level
Depression labels propagated to post-level labels
Start with corpus of 2.5 million

tweets (Sadilek et al. (2012)) from
6,237 users in New York City;
Sampled 1,370 tweets from 2000
with highest LIWC "sad" score and an
additional disjoint 630 tweets
matching suicide-risk-factor
keywords; half tweets annotated by
novice and half tweets annotated by
counseling psychologist with 4 labels
(Happy, No distress, low distress, high
distress) based on +- 3 tweets around
Distress Level match
Self-disclosed diagnosis statements
and manual verification; control
group contains age- and gender-
matched controls; Same as "Multi-
Task Learning" (Benton et al.) dataset,
with filtering down to users who post
Suicide Attempt, Schizophrenia, multiple times within a 3-hour
Panic, Eating, Anxiety window
Schizophrenia identified if 2+
conditions hold: self-disclose
diagnosis in user description, self-
disclose in status update, follows
@schizotribe; control set sampled
from 1% random stream and age-
Schizophrenia matched (manually)
Self-disclosed statements identified

using regular expressions; each
disclosure manually-verified by one of
the authors; age- and gender-
matched control group sampled from
Schizophrenia 1% random stream
Spanish and Portuguese regular

expressions to identify tweets likely
to contain each disease; tweets were
manually annotated as positive
(content indicates user has disease),
negative (content indicates user does
not have disease), or undecided
(neither) performed by 2 medical
Depression, Eating Disorders doctors and three engineers
CLPsych 2015 Shared Task Data (e.g.

regex for self-disclosed diagnoses);
Pennebaker and King (1999) steam-
of-consciousness essays where
students write down their thoughts,
sensations, and feelings as they come
Depression to them
Mood and Valence captured using
Photographic Affect Meter in EMAs;
Participants took a battery of
validated questionnaires (Perceived
Stress Scale, Depression Anxiety and
Stress Scale, Flourishing Scale) at start
of study; Supplemental Twitter data
labeled using regular expression
Mood Instability, Bipolar Disorder, disclosures with manual verification
Borderline Personality Disorder of authenticity
Identify candidate users using

followers of popular Borderline
Personality and Bipolar Disorder
accounts, Regular expressions against
Twitter profiles to identify Borderline
Personality and Bipolar Disorder;
Borderline Personality Disorder, random users sampled in equal
Bipolar Disorder quantity to form control group
Participants took PHQ-9 within

Depression mobile app MoodPrism
Regular expressions (strict match) to

identify depressed users, no mention
of "depress" character string to
annotate control group; create
candidate depression group using
Depression loose character match to "depress"
Zung's Self-rating Depression Scale

used to evaluate level of depression
Depression in each individual
Depression users identified using

small set of relevant keywords and
explicit reporting of taking anti-
depression medication; Control group
consists of random sample of users in
Depression the United States
Use regular expressions (eating-
disorder related keywords) within
profiles to identify candidates and
further require that profiles contain
biological information; snowball-
sampling followers of original
candidates used to expand set;
manual labeling of 1000 samples to
quantify precision of labeling process;
2 control groups sampled (1 randomly
and 1 based on followers of popular
music artists with name-inferred
Eating Disorder female sex)
Blog articles classified by symptoms

(e.g. depression, breast cancer);
Depression group randomly selected
based on depression tag, non-
depression selected randomly
Depression amongst remaining group
Identified mental-health communities

(e.g. adult_bipolar, alonedepressed)
and general communities (e.g.
curlyhair, cat_lovers) to serve as
Depression, Self-Harm, Suicide, comparison groups CLINICAL vs.
Bipolar Disorder, Grief CONTROL
Use "selfharm" and "selfinjury" tags

to seed search of Flickr for additional
high-precision relevant tags; identify
15 additional tags to identify
additional candidates; remove users
in candidate pool who use self-harm
tags in less than 5 posts; control
group sampled from YFCC dataset
and confirmed not to use any self-har
tags; researchers manually verified
Self-harm subset of self-harm posts
Users of the "Prozac" message board
labeled as depressed, while those on
the "Sad" message board labeled as
having ordinary sadness; Also look at
two happiness message boards
"gossiping" and "happy" to represent
messages with non-negative
Depression emotions
Users of 4 suicide-related forums and

10 depression-related forums on the
platform considered as part of the
"suicide" and "depression" groups
respectively group, while users who
did not participate in the forums but
were active were considered
Suicidal Ideation "control"
Consider 4 types of labels: Affiliation

Data (based on following
Schizophrenia and Related Disorders
Alliance of America account), Self-
Report (e.g. regular expressions);
Clinically-appraised Self-report (e.g.
self-report + clinician assessment of
history), Schizophrenic Patients in IRB
Study. Control data sampled to match
based on data characteristics (e.g.
Schizophrenia language, followers, friends)
Used word-embeddings to identify

terms likely to represent stressors
and stress subjects; manually
annotated 2,000 posts with stressor
and subject and an additional 600
Stress, Stress (Stressor and Stress posts considered as non-stress
Subject) related
Users detected by group of

psychologists with traditional
diagnosis criteria through
Depression questionnaires and surveys
Regular expression, subreddit
participation, and manual annotation
Suicidal Ideation (4-levels of risk)
Queried 2000 tweets, retweets, and

mentions of @TyquanAssassin;
manually filtered out tweets that did
not specifically reference 2 associated
deaths of the case study; manual
Grief, Aggression coding of all remaining tweets
Sample 53 posts from eating disorder

subreddits for manual binary
annotation (positive for immense risk,
negative otherwise); 6,000 additional
posts sampled as being the "hottest"
submissions on EatingDisorders,
BingeEatingDisorder,
eating_disorders, bulimia, proED, and
fuckeatingdisorders; Coders manually
evaluated top 50 (114 unique) most
likely risky posts as labeled by 5
Eating Disorder classifiers
Annotators were provided detailed

schematic for labelling; 2-levels
(relation to cyberbullying and then
type of cyberbullying); 7-types of
Cyberbullying cyber bullying
All comments manually labeled with

3-levels of aggression: overtly
aggressive, covertly aggressive, and
Aggression non-aggressive
External data set of posts made in an
online depression Forum (Ramirez-
Esparza et al. 2008) and online breast
cancer support forum (Gorbunova
2007); Reddit data set curated using
regular expressions on a post-level
(e.g. I was diagnosed with) in the
r/Depression subreddit; Control users
sampled from breast cancer, family,
and relationship support subreddits;
No use of manual verification (expect
Depression false positives)
Query Twitter streaming API using

keywords from APA's list of risk
factors and AAS's list of warning signs
related to suicide; identify 60
distressed users amongst sample that
frequently discuss depression,
suicide, and self-mutilation (in
addition to 60 other random users);
remove 500/5,446 tweets for manual
labeling (no distress, minimal distress,
Suicidal Ideation moderate distress, severe distress)
Task 1 (Anorexia): Regular-

expressions identify diagnosis
disclosure; Task 2 (Depression): RSDD
Dataset, regular expressions; Task 3
(Level of Depression): Beck's
Depression Inventory Questionnaire
Anorexia, Depression BDI
Identified tweets using the
#MyDepressionLooksLike hashtag;
filter down to original tweets from
human authors (e.g. no PSAs or
spam); each tweet manually coded by
2 annotators for theme
(Dysfunctional thoughts, Lifestyle
challenges, social struggles, hiding
behind a mask, apathy and sadness,
Depression and relief seeking)
Search for tweets using phrases e.g. "I

do not care about the law", "I wish
you die soon", and "Go to hell";
Manually annotated all tweets as
conveying antisocial behavior or not;
psychology graduate student verified
Antisocial Behavior annotations
Data from OurDataHelps.org (social

media data + history of mental
health); regular-expressions to
identify past suicide attempts +
manual verification; age- and gender-
Suicide Attempt matched controls
Regular expressions and flair within

bipolar disorder subreddits; control
groups sampled from over-indexing
subreddits (non-bipolar); filter out
users with less than 1000 words; filter
Bipolar Disorder out posts mentioning bipolar disorder
CES-D identified individuals with

depression; interviews of participants
coded manually by authors for
Depression qualitative analysis
Essays come from patients with
diagnosed Schizophrenia (and health
controls); Twitter data comes from
users with self-disclosed diagnoses
(e.g. regular-expressions) and age-
Schizophrenia and gender-matched controls
Regular-expressions identify diagnosis

disclosure; controls mentioned
anorexia or participated in discussion
Anorexia but did not have diagnosis
Regular-expression matching against

Twitter user profiles (phrases highly
indicative of disease); filter out users
with less than 100 tweets (no manual
Depression verification of authenticity)
Collect top-level posts from "Self-

harm", Depression and Suicide", and
"Friends and Family" subforums;
restrict to users with self-identified
age under 25 and gender being male
or female; human annotators (3)
coded each of posts based on 11
Self-harm topics
Leverage annotation schema from

prior work to label each tweet with
symptomology + relevance to
Depression (Symptoms) depression
Anxiety, Borderline Personality,

Bipolar, Opiate Addiction, Self Hard,
Addiction, Asperger's, Autism,
Alcoholism, Opiate Usage,
Schizophrenia, Self-hard, Suicidal
Ideation Subreddit participation
Depression CES-D
Mental Health (General) Tag-based proxy labelling
Regular expressions used to identify

individuals with diagnosis; control
groups include users who often post
about the disorder but do not have it
(e.g. support a peer or family
Depression member)
Regular expressions used to identify

individuals with diagnosis; control
groups include users who often post
about the disorder but do not have it
(e.g. support a peer or family
Depression, Anorexia member)
Size Availability
Train (326 depressed, 246 PTSD, 573

control); Test (150 depressed, 150
PTSD, 300 control) Available via Signed Agreement
Bipolar: 394 individuals (992k tweets)

Depression: 441 individuals (1.0M
tweets)
PTSD: 244 individuals (573k tweets)
SAD: 159 individuals (421k tweets)
Control: 5728 individuals (13.7M
tweets) Available via Signed Agreement
ADHD: 102 individuals (384k tweets)

Anxiety: 216 individuals (1.591M
tweets)
Borderline Personality: 101
individuals (321k tweets)
Depression: 393 individuals (546k
tweets)
Eating: 238 individuals (724k tweets)
OCD: 100 individuals (314k tweets)
PTSD: 403 individuals (1.251M
tweets)
Schizophrenia: 172 individuals (493k
tweets)
Seasonal Affective: 100 individuals
(340k tweets) Available via Signed Agreement
253 conversations; 9,062 visitor

messages; 5,320 counselor messages;
2,999 counselor paraphrases
165 individuals (137 w/o PPD, 28 w/
PPD); 578,200 data points (wall posts,
videos, photos, links, and check ins);
separated by pre and post-natal
period
2,000 tweets
1,593 individuals
69 individuals (41 low-mild

depression, 28 high depression);
5,706 tweets Not Available
476 individuals
28,479 individuals Not Available (MyPersonality Dataset)
208 individuals (81 depressed)

Depression: 105 individuals
PTSD: 63 individuals Not Available
PTSD: 244 individuals

Control: 6,100 individuals Available via Signed Agreement

Depression: 441 individuals (1.0M
tweets)
PTSD: 244 individuals (573k tweets)
SAD: 159 individuals (421k tweets)
Control: 5728 individuals (13.7M
tweets) Available via Signed Agreement
250 individuals (125 w/ past suicide
attempt, 125 control) Available via Signed Agreement
All individuals who posted in

manually identified subreddits
associated with each of the mental-
health disorders Reproducible via API
880 individuals (random sample, even

split between general mental health
and r/SuicideWatch) Reproducible via API
21,734 mental-health comments +

21,734 control comments (~15k
individuals each) Reproducible via API
26M posts; 100k individuals
2000 posts (14% high, 56% possibly,

29% safe) Not Available
55 users
12,106 individuals total (4,947

depressed) Reproducible via API
9,611 individuals with an average

3,521 tweets per individual Available via Signed Agreement
641 individuals in increase group, 368

individuals in no change group, 758
individuals in decrease group. Reproducible via API
469,849 counselor messages and
412,050 caller messages CTL Research Fellows Only

33 individuals (189,478 messages) Available Pending Future Exploration
469,849 counselor messages and

412,050 caller messages CTL Research Fellows Only
65,024 forum posts (of which only
1,227 have labels) Available via Signed Agreement
5,051 comments
108 individuals (2-months of logs)

934 individuals (sourced from
r/SuicideWatch) + Equal number of
control individuals Available via Signed Agreement
33 individuals will multiple platforms

(26 with SMS); > 1M incoming and
outgoing messages Available Pending Future Exploration
9,210 diagnosed individuals, 107,274

control individuals (control
individuals based on distance of
subreddit probability distributions) Available via Signed Agreement
2,000 tweets
31,554 posts from 496 users. Based

on "Expert, Crowdsourced, and
Machine Assessment of Suicide Risk
via Online Postings". Tasks based on
which subreddits the model has
access to data from. Available via Signed Agreement
9,473 annotations for 9,300 tweets (9
depressive stressors and 12
psychosocial stressors) Available
52 individuals (data restricted) Not Available
7,321 tweets
ADHD: 10,098 individuals

Anxiety: 8,783 individuals
Autism: 2,911 individuals
Bipolar: 6,434 individuals
Depression: 14,139 individuals
Eating: 598 individuals
OCD: 2,336 individuals
PTSD: 2,894 individuals
Schizophrenia: 1,331 individuals Available via Signed Agreement
598 Comments Available via Signed Agreement
129 Tweets Available
66,059 posts from 19,159 individuals Available
1,038 individuals (Up to 2,000

messages)
909 individuals
1,053 individuals (6,754 posts)
440 individuals MH to SW (62,024

comments of support from 32,362
unique users);
440 individuals MH (41,894
comments of support from 21,358
unique users) Reproducible via API


408 counselors; 3.2 million messages;

80,885 conversations Available
2,000 posts (14% high, 56% possibly,
29% safe) Not Available
49,113 threads; 500,754 posts;

75,000 individuals


3,554 labeled data points for 2,929
posts Available
Last Statements: 431 notes

Suicide Notes: 161 notes
Depression Notes: 142 notes
7,329 individuals
Candidate disclosures: 51,038,914

posts from 470,337 individuals
Control: 66,214,850 posts from
480,685 individuals
733,011 tweets from 594,776

individuals Reproducible via API
CLPsych: Train (326 depressed, 246

PTSD, 573 control); Test (150
depressed, 150 PTSD, 300 control)
Bell Lets Talk: 154 individuals (53

depressed) Available via Signed Agreement
Natural Stress Emotion Corpus: 38

student interviews with 2,272 binary-
labeled utterances
Stress Twitter Corpus: 367,312
Tweets (59,768 stressed)
Reproducible via API (but lacking

Not stated clear instructions)
154 individuals (53 depressed)
683 patients (114 depressed) Not available (PHI restrictions)

SWLS: 1298 SWLS <25, 785 SWLS >=
25
CES-D: 148 CES-D < 20; 466 >= 20 Not Available (MyPersonality Dataset)
166 individuals (71 of whom had a Not Available (confidentiality

history of depression) restrictions)
749 individuals Not Available (protocol restrictions)
Posts Per Class

-- Depression: 18,203
-- Drug Use: 138,021
-- Drinking: 4,979
-- Sleep Disorder: 4,758
-- Eating Disorder: 234
No Control, Just Choosing Which
Disorder
1188 labeled posts (40 crisis, 137 red,
296 amber, 715 green) Available via Signed Agreement
1,040 threads Available via Signed Agreement
785 posts
r/loseit: 2.3 million comments in 164k

posts
r/proED: 123k comments from 8.5k
posts Reproducible via API
200 individuals (209,290 tweets) Reproducible via API
# of Comments: Borderline (11,880),

Bipolar (41,636), Schizophrenia
(4,963), Anxiety (57,523), Depression
(197,436), Self-harm (17,102),
Suicidality (90,518), Addiction (4,360),
Opiates (65,143), Autism (9,470),
Control (476,388) Reproducible via API
1,569 documents (523 anxiety

concatenated histories, 523 anxiety
concatenated histories from non-
anxiety forums, 523 comparison
concatenated histories from
members not in non-anxiety forums) Reproducible via API
# of Comments: Borderline (11,880),

Bipolar (41,636), Schizophrenia
(4,963), Anxiety (57,523), Depression
(197,436), Self-harm (17,102),
Suicidality (90,518), Addiction (4,360),
Opiates (65,143), Autism (9,470),
Control (476,388) Reproducible via API
400 posts per data set group Reproducible via API
Up to 2000 posts per individual. Final

data set had 531,453 submissions
from 892 users (125 depressed).
Randomly balanced classes, sampling

~2000 posts total for their data set Reproducible via API
Anxiety (9971 posts), Control (12,837

posts) Reproducible via API
974 respondents. 117 has WSC, 190
high suicide risk, 49 severe
depression, 140 severe anxiety, 45
severe stress
7314 posts (664 suicide)
492676 posts (239038 stressed).

23304 individuals (11074 stressed) Reproducible via API
57785 tweets (14931 not stressed)
122 depressed, 346 non-depressed

individuals (6013 posts total)
Weibo: 580 depressed, 580 control
Twitter (Shen 2017): 1394 depressed,
1394 control
36 individuals (21,648 tweets)
13,317 users (2,353 recovery, 10,964

non-recovery); 68 MM posts (25MM
recovery, 42MM non-recovery). Posts
shared between 2/20/2007 and
8/4/2014 Reproducible via API
Anorexia: 55,334 posts, 18,293 users

(11,301 pro-recovery, 44,033 pro-
ana); Control: 32,000 posts Reproducible via API
459 posts (206 distorted, 252

undistorted)
Schizophrenia Authenticity (146 yes,
101 maybe, 424 no Users) with
(1.9M, 1.5M, and 8.8M tweets,
respectively); Additional sample of
100 users (18 authenticated positive
by experts)
135 Participants (17 Suicidal)
816 Tweets (13% evidence of possible

suicidal ideation)
376 validated new mothers (36,948
posts prenatal, 40,426 post-natal)
Depression (117 users, 23,984 users);

Control (157 users, 45,530 posts) Not Available
2000 tweets
Anxiety (2,408 users), eating disorder
(749 users), panic attacks (263 users),
schizophrenia (350 users), and suicide
attempt (424 users)
Schizophrenia (96 users), Control (200

users)
Schizophrenia (174 users), Control

(174 users)
Depression (Spanish: 3,253 tweets

[160 positive], Portuguese: 2,846
[120 positive] tweets); Eating
Disorders (Spanish: 412 [111 positive]
tweets, Portuguese: 468 [87 positive]
tweets)
CLPsych Data (~600 depression +

~600 age- & gender- matched
controls); Steam-of-consciousness
essays (6,459 individuals)
EMA (51 participants, 1,606
responses), Facebook CL Study (23
participants, 13,340 status updates),
Twitter CL (10 participants, 1425
tweets); Twitter Bipolar (6,326 users,
14M tweets); Twitter Borderline
(3,238 users 7M tweets); Twitter
Control (9,394 users, 15M tweets)
Bipolar Disorder (278 users),

Borderline Personality Disorder (203
users), Control (548 users)
Facebook (538 status updates, 29

users), Twitter (1,318 posts, 49 users)
Depression: 1402 users (292,564

tweets)
Control: >300M users (>10 billion
tweets)
Candidate Depression: 36,993 users
(35M tweets) Freely available for download
50 participants
Depression (50 users); Control (100

users)
Eating disorder (3,380 users),
Random Control (30,684 users),
Young + Female Control (37,983
users)
Depression (100 authors), Non-

Depression (100 authors)
Clinical (38,401 posts from 24

communities), Control (229,563 posts
from 23 communities)
Self-harm (20,495 users, 93,286

posts), Control (19,720 users, 93,286
randomly sampled posts)
Gossiping (1,699 users, 6,505 posts),
Happy (2,695 users, 11,209 posts),
Prozac (1,027 users, 6,015 posts), Sad
(1,652 users, 4,900 posts)
Suicide (9,990 users), Depression

(24,410 users), Control Group
(228,949 users)
Affiliation: 1847 users

Self-report: 412 users
Clinically-appraised Self-report: 153
users
Patients: 88 patients
Control: Equal number for each type
Stress (2,000 posts), Control (600

posts) Available
Depression (90 users), Non-

depression (90 users)
31,554 posts from 496 users. Based
on "Expert, Crowdsourced, and
Machine Assessment of Suicide Risk
via Online Postings". Tasks based on
which subreddits the model has
access to data from. Available via Signed Agreement
General (718 tweets)
Positive (38 posts), Negative (15

posts), Unlabeled (6,000)
English (113,698 posts [5,375

cyberbullying]), Dutch (78,387 posts,
[5,106 cyberbullying]) Available via Signed Agreement
Facebook (15,000 comments), Twitter

(1,257 English + 1,194 Hindi Tweets) Available
Depression (1,293 posts), Non-
depression (548 posts) Reproducible via API
Depression (60 users, 2,381 tweets),

Non-depression (60 users, 3,065
tweets)
Task 1: Anorexia (61 users, 24,874

posts), Non-anorexia (411 users,
228,878 posts); Task 2: Depression
(9,210 users), Non-depression
(107,274 users); Task 3: Level of
Depression (20 users) Available via Signed Agreement
1,978 tweets
55,810 tweets
Suicide Attempt (418 users, 197,615

posts), Controls (418 users, 197,615
posts)
Bipolar Disorder (3,488 users);

Control (3,931 users) Reproducible via API
Depression (7 participants), Non-

depression (7 participants) Not Available
Essays: Schizophrenia (93 patients),
Control (95 patients); Twitter:
Schizophrenia (174 users), Control
(174 users)
Anorexia (61 users, 24,874 posts),

Non-anorexia (411 users, 228,878
posts) Available via Signed Agreement
Depression (2,000 users), Control

(2,000 users)
2,359 posts
9,300 tweets
All individuals who posted in

manually identified subreddits
associated with each of the mental-
health disorders Reproducible via API
45 individuals Not Available

Mental Illness (770 users, 14,781
posts); Pre-mental Illness (658 users,
11,828 posts); Health Users (15,000
users, 15,000 posts) Reproducible via API
Depression (135 users, 49,557 posts);

Control (752 users, 481,337 posts) Available via Signed Agreement
Task 1: Depression (214 users, 90,222

posts), Control (831 users, 0.9M
posts); Task 2: Anorexia (61 users,
24,874 posts); Control (411 users,
228,878 posts) Available via Signed Agreement
Additional Comments Dataset Link (if any)
Japanese Language Only
Korean Language Only
Chinese Language Only
http://ir.cs.georgetown.edu/resource
s/rsdd.html
https://research.cs.wisc.edu/bullying/
data.html
s/smhd.html
s/
Pilot Study

s/rsdd.html
http://snap.stanford.edu/counseling/
s/rsdd.html
http://www.cs.columbia.edu/~eturca
n/data/dreaddit.zip
Not a predictive task, but rather a

tracking study.
Bell Lets Talk dataset detailed in Jamil

et al., 2017 (Masters Thesis)
CLPsych 2016 dataset

Same dataset as "The language of
mental health problems in social
media" (2016)
Same dataset as "The language of

mental health problems in social
media" (2016)
Leverages external data sets from
Ramirez-Esparza 2008 and Gorunova
2007 (online support forums)
eRISK 2017 Dataset
Also identified data in college

subreddits (pre- and post- campus
shooting incidents), but it doesn't
have any ground truth associated
with it.
Chinese Language Only; also

examined performance on 3 other
datasets (construction unclear) on
Sina Weibo, Tencent Weibo, and
Twitter
Chinese Language; Also includes

images

Uses Shen et al. 2017 for Twitter data
set
English only
Spanish, Portuguese language from
Spain and Portugal, respectively
Leverage existing CLPsych 2015

dataset for depression classification
CL: CampusLife Study at Georgia Tech
http://depressiondetection.droppage
s.com

Chinese Language
English http://stressmeasure.droppages.com
Data comes from "A depression

detection model based on sentiment
analysis in micro-blog social
network"; Chinese language only
https://osf.io/rgqw8/
Subset of Kumar et al. (2018); English

+ Hindi versions http://trac1-dataset.kmiagra.org/
Leverages data from Pirina et al.
(2018)
Task 1 comes from eRisk 2018, Task 2

comes from RSDD (Georgetown),
Task 3 is new data
eRisk 2019 Dataset
Builds upon Mowery et al. 2016
From Gkotsis et al. (2016)
Korean Language Only

Dataset originally proposed in Losada
et al. (2016) "A test collection for
research on depression and language
use"
Reference Link
https://www.aclweb.org/anthology/
W15-1204/
W14-3207/
W15-1201/
W19-3001/
http://www.munmund.net/pubs/csc
w_14_1.pdf
https://www.sciencedirect.com/scien
ce/article/pii/S0747563215300996
https://pdfs.semanticscholar.org/e88
6/3d0ace1ad50f2fd9bc64ea953df827
1a60c1.pdf
https://pdfs.semanticscholar.org/8dd
5/8913bd343f4ef23b8437b24e152d3
270cdaf.pdf
https://www.aaai.org/ocs/index.php/
ICWSM/ICWSM13/paper/viewFile/61
24/6351
W14-3214/
https://dl.acm.org/citation.cfm?
id=2702280
https://www.nature.com/articles/s41
598-017-12961-9
ICWSM/ICWSM14/paper/viewFile/80
79/8082
W15-1203/
W16-0311/
https://arxiv.org/pdf/1607.07384.pdf
W16-0307/
https://www.ncbi.nlm.nih.gov/pmc/a
rticles/PMC5659860/
rticles/PMC5565736/
http://www.munmund.net/pubs/csc
w16_MIS.pdf
https://www.sciencedirect.com/scien
ce/article/pii/S2214782915000160
https://www.jmir.org/2013/10/e217/
W18-4102/
https://arxiv.org/abs/1712.03538
http://lit.eecs.umich.edu/files/LiMiha
lceaWilson_Socinfo_18.pdf
https://dspace.mit.edu/bitstream/ha
ndle/1721.1/110590/Picard_Mixed-
initiative.pdf?
sequence=1&isAllowed=y
https://osf.io/6r2nq
https://www.jmir.org/2019/1/e11507
/
W16-0312/
https://link.springer.com/chapter/10.
1007/978-3-030-02686-8_30
https://www.cs.rochester.edu/u/kaut
z/papers/www_2019_detecting_low_
selfesteem_4089820.pdf
https://s3.amazonaws.com/academia
.edu.documents/56757600/W18-
0603.pdf?response-content-
disposition=inline%3B%20filename
%3DExpert_Crowdsourced_and_Mac
hine_Assessme.pdf&X-Amz-
Algorithm=AWS4-HMAC-SHA256&X-
Amz-
Credential=AKIAIWOWYYGZ2Y53UL3
A%2F20191108%2Fus-east-
1%2Fs3%2Faws4_request&X-Amz-
Date=20191108T130308Z&X-Amz-
Expires=3600&X-Amz-
SignedHeaders=host&X-Amz-
Signature=f6df0e59e77b9b066b854b
e2d29b7756d6cc6e8c4dbaecdb41ac0
61816374aef
rticles/PMC6442737/
W19-3003/
https://www.researchgate.net/profile
/Marina_Litvak/publication/3125912
27_Social_and_linguistic_behavior_a
nd_its_correlation_to_trait_empathy
/links/58877a724585150dde501df8/S
ocial-and-linguistic-behavior-and-its-
correlation-to-trait-
empathy.pdf#page=196
rticles/PMC6736249/
http://pages.cs.wisc.edu/~jerryzhu/p
ub/junming-thesis.pdf
http://ir.cs.georgetown.edu/downloa
ds/macavaney-clpsych2018.pdf
W15-1211/
rticles/PMC5507358/
1007/978-3-319-15554-8_45
https://www.ncbi.nlm.nih.gov/pubm
ed/26543921
https://www.microsoft.com/en-
us/research/publication/language-
social-support-social-media-effect-
suicidal-ideation-risk/
W19-3013/
https://www.aclweb.org/anthology/Y
18-1070/
http://timalthoff.com/docs/althoff-
2016-mental_health.pdf
https://www.aclweb.org/anthology/U
16-1010/
https://aisel.aisnet.org/cgi/viewconte
nt.cgi?article=1269&context=wi2019
https://www.aclweb.org/anthology/D
19-5542/
19-6208/
19-6213/
19-6217/
19-1181/
https://dl.acm.org/citation.cfm?
id=2998220
https://econtent.hogrefe.com/doi/ab
s/10.1027/0227-5910/a000234?
journalCode=cri
W18-0609
https://ieeexplore.ieee.org/abstract/
document/8461990
document/8554733/
https://paulallen.ca/docs/Jamil,%20Z
%20Monitoring%20tweets%20for
%20depression%20to%20detect
%20at-risk%20users%20-
%202017.pdf
https://www.pnas.org/content/115/4
4/11203?
utm_source=yxnews&utm_medium=
mobile
document/8419355
https://epjdatascience.springeropen.
com/articles/10.1140/epjds/s13688-
017-0110-z
https://www.jmir.org/2018/12/e1181
7/
1007/978-3-319-67256-4_7
https://asistdl.onlinelibrary.wiley.com
/doi/full/10.1002/asi.23865
https://dl.acm.org/doi/abs/10.1145/3
173574.3174240
http://www.munmund.net/pubs/Anxi
ety_SocialInt_ICWSM18.pdf
https://www.nature.com/articles/sre
p45141
W18-0620/
W18-0607/
W18-5903/
159652.3159725
134727
W17-3107/
https://www.aclweb.org/anthology/Y
15-1064/
647868.2654945
document/6890213
1007/978-3-642-40319-4_18
https://eprints.soton.ac.uk/423226/
1007/978-3-319-25261-2_3
858036.2858246
750511.2750515
document/8031202
https://mental.jmir.org/2016/2/e21/
?
utm_source=TrendMD&utm_medium
=cpc&utm_campaign=JMIR_TrendMD
_1
700171.2791023
470654.2466447
464464.2464480
W14-3213/
W17-3110/
rticles/PMC4525233/
W15-1202/
https://journals.plos.org/plosone/arti
cle?
id=10.1371/journal.pone.0086191
W15-1212/
130960
document/7752434
https://nextcenter.org/wp-
content/uploads/2018/02/Depressio
n-Detection-via-Harvesting-Social-
Media-A-Multimodal-Dictionary-
Learning-Solution.pdf
https://ieeexplore.ieee.org/documen
t/6549431
079452.3079465
018661.3018706
SSS/SSS14/paper/viewFile/7744/7782
document/6784326
038912.3052555
1007/978-3-642-37453-1_23
cle?
290605.3300364
https://hcsi.cs.tsinghua.edu.cn/Paper
/Paper16/IJCAI-linhuijie.pdf
document/6753906
W19-3005/
https://journals.sagepub.com/doi/full
/10.1177/1178222618763155
https://onlinelibrary.wiley.com/doi/f
ull/10.1002/eat.23148
cle?
W18-4401/
document/8681445
document/8269767
http://ceur-ws.org/Vol-
2380/paper_66.pdf
https://mental.jmir.org/2017/4/e43/
1007/978-3-030-22354-0_43
https://journals.sagepub.com/doi/full
/10.1177/1178222618792860
https://citeseerx.ist.psu.edu/viewdoc
/download?
doi=10.1.1.941.6087&rep=rep1&type
=pdf
http://ceur-ws.org/Vol-
2380/paper_74.pdf
110025.3123028
https://onlinelibrary.wiley.com/doi/f
ull/10.1111/sltb.12569
269206.3271732
https://info.computer.org/csdl/proce
edings-
article/bigcomp/2016/07425918/12O
mNAkEU5D
https://www.aclweb.org/anthology/2
020.lrec-1.772/
1007/978-3-319-65813-1_30
https://tec.citius.usc.es/ir/pdf/eRisk2
018LNCS.pdf

Data Sources

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Sources

Uploaded by

Copyright:

Available Formats

Paper Authors

CLPsych 2015 Shared Task: Depression Coppersmith, Dredze, Harman,

Quantifying Mental Health Signals in

From ADHD to SAD: Analyzing the

Towards Augmenting Crisis Counselor

Content Analysis of Depression-Related Cavazos-Reh, Krauss, Sowles,

Cross-cultural differences in language Loveys, Torrez, Fine, Moriarty,

Depressive Moods of Users Portrayed in

De Choudhury, Gamon, Counts,

Towards Assessing Changes in Degree of Schwartz, Eichstaedt, Kern, Park, Sap,

Measuring post traumatic stress disorder

The language of mental health problems Gkotis, Oellrich, Hubbard, Dobson,

Discovering shifts to suicidal ideation

Social Media Based Index of Mental Well-

Quantifying and Predicting Mental Illness

O'Dea, Wan, Batterham, Calear, Paris,

Detecting Linguistic Traces of Depression

Multi-Task Learning for Mental Health

Text-based Detection and Understanding

Quantifying Mental Health from Social Amir, Coppersmith, Carvalho, Silva,

Can Text Messages Identify Suicide Risk in

Protecting User Privacy and Rights in

Detecting Low Self-Esteem in Youths from

Identification of Imminent Suicide Risk Nobles, Glenn, Kowsari, Teachman,

Depression and Self-Harm Risk

Learning from various labeling strategies

CLPsych 2019 Shared Task: Predicting the

Can acute suicidality be predicted by

SMHD: A Large-Scale Resource for

Towards Developing an Annotation

Detecting Changes in Suicide Content

Using Linguistic Features to Estimate

Identifying Chinese Microblog Users with

The Language of Social Support in Social

Mental Health Surveillance over Social

Feature Attention Network: Interpretable

Natural Language Processing for Mental

User Dynamics in Mental Health Forums

Adapting Deep Learning Methods for

Multi-Task, Multi-Channel, Multi-Input

Dilated LSTM with attention for

Gender and Cross-Cultural Differences in De Choudury, Sharma, Logar,

Tracking Suicide Risk Factors Through Jashinky, Burton, Hanson, West,

Deep Learning for Depression Detection Husseini Orabi, Buddhitha, Husseini

Attention-based LSTM for Psychological

Suicidal Trend Analysis of Twitter using Shahreen, Subhani, Mahfuzur

Monitoring Tweets for Depression to

Eichstaedt, Smith, Merchant, Ungar,

Instagram photos reveal predictive

Exploring the utility of community-

Predicting Multiple Risky Behaviors via

Helping or hurting? predicting changes in

Detecting suicidal ideation on forums: Aladağ, Murderrisoglu, Akbas,

Norms matter: contrasting social support

Characterization of mental health

Within and between-person differences

Hierarchical neural model with attention

Measuring the latency of depression

Modeling Stress with Social Media

Detecting anxiety on Reddit Hanwen Shen, Rudzicz

Topic Model for Identifying Suicidal

User-level psychological stress detection

Psychological stress detection from cross-