You are on page 1of 141

Paper Authors

CLPsych 2015 Shared Task: Depression Coppersmith, Dredze, Harman,


and PTSD on Twitter Hollingshead, Mitchell

Quantifying Mental Health Signals in


Twitter Coppersmith, Dredze, Harman

From ADHD to SAD: Analyzing the


Language of Mental Health on Twitter Coppersmith, Dredze, Harman,
through Self-Reported Diagnoses Hollingshead

Towards Augmenting Crisis Counselor


Training by Improving Message Retrieval DeMasi, Hearst, Recht
Characterizing and Predicting Postpartum
Depression from Shared Facebook Data De Choudhury, Counts, Horvitz, Hoff

Content Analysis of Depression-Related Cavazos-Reh, Krauss, Sowles,


Tweets Connolly, Rosasa, Bharadwaj, Bierut

Cross-cultural differences in language Loveys, Torrez, Fine, Moriarty,


markers of depression online Coppersmith

Depressive Moods of Users Portrayed in


Twitter Park, Cha, Cha

De Choudhury, Gamon, Counts,


Predicting Depression via Social Media Horvitz

Towards Assessing Changes in Degree of Schwartz, Eichstaedt, Kern, Park, Sap,


Depression through Facebook Stillwell, Kosinski, Ungar
Recognizing Depression From Twitter Tsugawa, Kikuchi, Kishino, Nakajimi,
Activity Itoh, Ohsaki
Forecasting the Onset and Course of Reece, Reagan, Lix, Dodds, Danforth,
Mental Illness with Twitter Data Langer

Measuring post traumatic stress disorder


in Twitter Coppersmith, Harman, Dredze

The role of personality, age, and gender in Preotiuc-Pietro, Eichstaedt, Park, Sap,
tweeting about mental illnesses Smith, Toblosky, Schwartz, Ungar
Exploratory Analysis of Social Media Prior
to a Suicide Attempt Coppersmith, Ngo, Leary, Wood
Identifying Depression on Twitter Nadeem, Horn, Coppersmith, Sen

The language of mental health problems Gkotis, Oellrich, Hubbard, Dobson,


in social media Liakata, Velupillai, Dutta

Discovering shifts to suicidal ideation


from mental health content in social De Choudhury, Kiciman, Dredze,
media Coppersmith, Kumar

Social Media Based Index of Mental Well-


Being in College Campuses Bagroy, Kumaraguru, De Choudhury

Quantifying and Predicting Mental Illness


Severity in Online Pro-Eating Disorder Chancellor, Lin, Goodman, Zerwas, De
Communities Choudhury

O'Dea, Wan, Batterham, Calear, Paris,


Detecting Suicidality on Twitter Christensen
Activities on Facebook Reveal the
Depressive State of Users Park, Lee, Kwak, Cha, Jeong

Detecting Linguistic Traces of Depression


Topic-Restricted Text: Attending to Self-
Stigmatized Depression with NLP Wolohan, Hirgaga, Mukerjee, Sayyed

Multi-Task Learning for Mental Health


using Social Media Text Benton, Mitchell, Hovy

Text-based Detection and Understanding


of Changes in Mental Health Li, Mihalcea, Wilson
Mixed-Initiative Real-Time Topic
Modeling & Visualization for Crisis Dinakar, Chen, Lieverman, Picard, Fill-
Counseling in

Quantifying Mental Health from Social Amir, Coppersmith, Carvalho, Silva,


Media with Neural User Embeddings Wallace

Can Text Messages Identify Suicide Risk in


Real Time? A within-subjects pilot
examination of temporally sensitive
markers of suicide risk Glenn, Nobles, Barners, Teachman

Protecting User Privacy and Rights in


Academic Data-Sharing Partnerships: Pisani, Kanuri, Filbin, Gallo, Gould,
Principles from a pilot program at Crisis Lehmann, Levine, Marcotte, Pascal,
Text Line Rousseau, Turner, Yen, Ranney
CLPsych 2016 Shared Task: Triaging
Content in Online Peer Support Forums Milne, Pink, Hachey, Calvo
Detecting Comments Showing Risk for
Suicide in YouTube Gao, Cheng, Yu

Detecting Low Self-Esteem in Youths from


Web Search Data Zaman, Acharyya, Kautz, Silenzio
Expert, Crowdsourced, and Machine
Assessment of Suicide Risk via Online Shing, Nair, Zirikly, Friedenberg,
Postings Daumé III, Resnik

Identification of Imminent Suicide Risk Nobles, Glenn, Kowsari, Teachman,


Among Young Adults using Text Messages Barnes

Depression and Self-Harm Risk


Assessment in Online Forums Yates, Cohan, Goharian

Learning from various labeling strategies


for suicide-related messages on social
media: An experimental study Liu, Chen, Homan, Silenzio

CLPsych 2019 Shared Task: Predicting the


Degree of Suicide Risk in Reddit Posts Zirikly, Resnik, Uzuner, Hollingshead
Towards Automatically Classifying
Depressive Symptoms from Twitter Data
for Population Health Mowery, Park, Conway, Bryan

Can acute suicidality be predicted by


Instagram data? Results from qualitative Brown, Bendig, Fischer, Goldwich,
and quantitative language analyses Baumeister, Plener
Understanding and Fighting Bullying with
Machine Learning Junming Sui

SMHD: A Large-Scale Resource for


Exploring Online Language Usage for Cohan, Desmet, Yates, Soldaini,
Multiple Mental Health Conditions MacAvaney, Goharian
RSDD-Time: Temporal Annotation of Self- MacAvaney, Desmet, Cohan, Soldaini,
Reported Mental Health Diagnoses Yates, Zirikly, Goharian

Towards Developing an Annotation


Scheme for Depressive Disorder
Symptoms: A Preliminary Study using
Twitter Data Mowery, Bryan, Conway

Detecting Changes in Suicide Content


Manifested in Social Media Following Kumar, Dredze, Coppersmith, De
Celebrity Suicides Choudury

Using Linguistic Features to Estimate


Suicide Probability of Chinese Microblog
Users Zhang, Huang, Liu, Chen, Zhu

Identifying Chinese Microblog Users with


High Suicide Probability Using Internet-
Based Profile and Linguistic Features:
Classification Model Guan, Hao, Cheng, Yip, Zhu
Detecting Suicidal Ideation in Chinese
Microblogs with Psychological Lexicons Huang, Zhang, Liu, Chiu, Li, Zhu

The Language of Social Support in Social


Media and its Effect on Suicidal Ideation
Risk De Choudury, Kiciman

Mental Health Surveillance over Social


Media with Digital Cohorts Amir, Dredze, Ayers

Feature Attention Network: Interpretable


Depression Detection from Social Media Song, You, Chunk, Park

Natural Language Processing for Mental


Health: Large Scale Discourse Analysis of
Counseling Conversations Althoff, Clark, Leskovec
The Role of Features and Context on
Suicide Ideation Detection Wang, Wan, Paris

User Dynamics in Mental Health Forums


-- A Sentiment Analysis Perspective Davcheva, Adam, Benlian

Adapting Deep Learning Methods for


Mental Health Prediction on Social Media Sekulic, Strube

Multi-Task, Multi-Channel, Multi-Input


Learning for Mental Illness Detection
using Social Media Text Kirinde Gamaarachichige, Inkpen
Dreaddit: A Reddit Dataset for Stress
Analysis in Social Media Turcan, McKeown

Dilated LSTM with attention for


Classification of Suicide Notes Schoene, Lacy, Turner, Dethlefs
Latent Suicide Risk Detection on
Microblog via Suicide-Oriented Word
Embeddings and Layered Attention Cao, Zhang, Feng, Wei, Wang, Li, He

Gender and Cross-Cultural Differences in De Choudury, Sharma, Logar,


Social Media Disclosures of Mental Illness Eekhout, Cluasen Nielsen

Tracking Suicide Risk Factors Through Jashinky, Burton, Hanson, West,


Twitter in the US Giraud-Carrier, Barnes, Argyle

Deep Learning for Depression Detection Husseini Orabi, Buddhitha, Husseini


of Twitter Users Orabi, Inkpen

Attention-based LSTM for Psychological


Stress Detection from Spoken Language
Using Distant Supervision Winata, Pepijin Kampman, Fung

Suicidal Trend Analysis of Twitter using Shahreen, Subhani, Mahfuzur


Machine Learning and Neural Network Rahman

Monitoring Tweets for Depression to


Detect At-risk Users Jamil

Eichstaedt, Smith, Merchant, Ungar,


Facebook language predicts depression in Crutchley, Pretoiuc-Pietro, Asch,
medical records Schwartz
A multilevel predictive model for
detecting social network users with
depression Wongkoblap, Vadillo, Curcin

Instagram photos reveal predictive


markers of depression Reece, Danforth

Exploring the utility of community-


generated social media content for
detecting depression: an analytical study
on Instagram Ricard, Marsch, Crosier, Hassanpour

Predicting Multiple Risky Behaviors via


Multimedia Content Zhou, Zhang, Luo
Triaging content severity in online mental
health forums Cohan, Young, Yates, Goharian

Helping or hurting? predicting changes in


users’ risk of self-harm through online Soldaini, Walsh, Cohan, Han,
community interactions Goharian

Detecting suicidal ideation on forums: Aladağ, Murderrisoglu, Akbas,


proof-of-concept study Zahmacioglu, Bingol

Norms matter: contrasting social support


around behavior change in online weight
loss communities Chancellor, Hu, De Choudhury
Measuring the impact of anxiety on
online social interactions Dutta, Ma, De Choudhury

Characterization of mental health


conditions in social media using Informed Gkotsis, Oellrich, Velupillai, Liakata,
Deep Learning Hubbard, Dobson, Dutta

Within and between-person differences


in language used across anxiety support
and neutral reddit communities Ireland, Iserman

Hierarchical neural model with attention


mechanisms for the classification of social
media text related to mental health Ive, Gkotis, Dutta, Stewart, Velupillai
Identifying depression on reddit: The
effect of training data Pirina, I. & Çöltekin, Ç

Measuring the latency of depression


detection in social media. Sadeque, Xu, Bethard

Modeling Stress with Social Media


Around Incidents of Gun Violence on
College Campuses Saha, De Choudhury

Detecting anxiety on Reddit Hanwen Shen, Rudzicz


Assessing suicide risk and emotional
distress in Chinese social media: a text
mining and machine learning study Cheng, Li, Kwok, Zhu, Yip

Topic Model for Identifying Suicidal


Ideation in Chinese Microblog Huang, Li, Zhang, Liu, Chiu, Zhu

User-level psychological stress detection


from social media using deep neural
network Lin, Jia, Guo, Xue, Li, Huang, Cai, Feng

Psychological stress detection from cross-


media microblog data using deep sparse
neural network Lin, Jia, Guo, Xue, Li, Huang, Cai, Feng

A depression detection model based on


sentiment analysis in micro-blog social
network Wang, Zhang, Ji, Sun, Wu, Bao
Cross-domain depression detection via Shen, Jia, Shen Feng, He, Luan, Tang,
harvesting social media Tiropanis, Chua, Hall

Teenagers’ stress detection based on


time-sensitive microblog
comment/response actions Zhao, Jia, Feng

Recovery Amid Pro-Anorexia: Analysis of


Recovery in Social Media Chancellor, Mitra, De Choudhury

Anorexia on Tumblr: A Characterization


Study on Anorexia De Choudhury

Detecting cognitive distortions through Simms, Ramstedt, Rich, Richards,


machine learning text analytics Martinez, Giraud-Carrier
A collaborative approach to identifying
social media markers of schizophrenia by
employing machine learning and clinical Birnbaum, Kiranmai Ernala, Rizvi, De
appraisals Choudhury, Kane

Validating machine learning algorithms


for twitter data against established Braithwaite, Giraud-Carrier, West,
measures of suicidality Barnes, Lee Hanson

Machine Classification and analysis of


suicide-related communication on Twitter Burnap, Colombo, Scourfield
Predicting postpartum changes in
emotion and behavior via social media De Choudhury, Counts, Horvitz

Social Media As a Measurement Tool of


Depression in Populations De Choudhury, Counts, Horvitz

Toward Macro-Insights for Suicide


Prevention: Analyzing Fine-Grained
Distress at Scale Homan, Johar, Liu, Lytle, Silenzio, Alm
Small but Mighty: Affective Micropatterns
for Quantifying Mental Health from Social Loveys, Crutchley, Wyatt,
Media Language Coppersmith

Mining Twitter data to improve detection McManus, Mallory, Goldfelder,


of schizophrenia Haynes, Tatum

Quantifying the language of schizophrenia


in social media Mitchell, Hollingshead, Coppersmith

Twitter: a good place to detect health Prieto, Matos, Alvarez, Cacheda,


conditions Oliveira

Beyond LDA: exploring supervised topic


modeling for depression-related language Resnik, Armstrong, Claudino, Nguyen,
in Twitter Nguyen, Boyd-Graber
Inferring Mood Instability on Social Media
by Leveraging Ecological Momentary Saha, Chan, Barbaro, Abowd, De
Assessments Choudhury

MIDAS: Mental illness detection and Saravia, Chang, Jollet De Lorenzo,


analysis via social media Chen

Predicting depression from language-


based emotion dynamics: longitudinal
analysis of Facebook and twitter status
updates Seabrook, Kern, Fulcher, Rickard

Depression detection via harvesting social


media: A multimodal dictionary learning
solution Shen, Jia, Feng, Zhang, Hu, Chua, Zhu

On estimating depressive tendency of Tsugawa, Mogi, Kikuchi, Kishino,


twitter users from their tweet data Fujita, Itoh, Ohsaki

Emotional and Linguistic Cues of


Depression from Social Media Vedula, Parthasarathy
Detecting and Characterizing Eating-
Disorder Communities on Social Media Wang, Brede, Ianni, Mentzakis

Defining patients with depressive


disorder by using textual information Nakamura, Kubo, Usuda, Aramaki

Affective and content analysis of online


depression communities Nguyen, Phung, Dao, Venkatesh, Berk

Understanding and Discovering


Deliberate Self-harm Content in Social Wang, Tang, Li, Li, Wan, Mellina,
Media O'Hare, Chang
Exploiting Temporal Information in a Two-
Stage Classification Framework for
Content-Based Depression Shen, Kuo, Chen, Lin

Suicide Ideation of Individuals in Online


Social Networks Masuda, Kurahashi, Onari

Methodological Gaps in Predicting Mental


Health States from Social Media: Ernala, Birnbaum, Candan, Rizvi,
Triangulating Diagnostic Signals Sterling, Kane, De Choudury

What does social media say about your


stress? Lin, Jia, Nie, Shen, Chua

An improved model for depression


detecting in micro-blog social network Wang, Zhang, Sun
Matero, Idnani, Son, Giorgi, Vu,
Suicide Risk Assessment with Multi-level Zamani, Limbachiya, Guntuku,
Dual-Context Language and BERT Schwartz

Accommodating Grief on Twitter: An


Analysis of Expressions of Grief Among
Gang Involved Youth on Twitter Using
Qualitative Analysis and Natural Language Upton Patton, MacBeth,
Processing Schoenebeck, Shear, McKeown

Automatic detection of eating disorder-


related social media posts that could Yan, Fitzsimmons-Craft, Goodman,
benefit from a mental health intervention Krauss, Das, Cavazos-Rehg

Van Hee, Jacobs, Emmery, Desmet,


Automatic detection of cyberbullying in Lefever, Verhoeven, De Pauw,
social media text Daelemans, Hoste

Benchmarking Aggression Identification in


Social Media Kumar, Ojha, Malmasi, Zampieri
Detection of Depression-related Posts in
Reddit Social Media Forum Tadesse, Lin, Xu, Yang

Detection of Suicide-Related Posts in


Twitter Data Streams Vioulès, Moulahi, Azé, Bringay

BioInfo@UAVR at eRisk 2019: delving into


social media texts for the early detection
of mental and food disorders Trifan, Luís Oliveira
#MyDepressionLooksLike: Examining
Public Discourse About Depression on Lachmar, Wittenborn, Bogen,
Twitter McCauley

A Framework for Early Detection of


Antisocial Behavior on Twitter Using Singh, Du, Zhang, Wang, Miao,
Natural Language Processing Sianaki, Ulhaq

Natural Language Processing of Social


Media as Screening for Suicide Risk Coppersmith

Not Just Depressed: Bipolar Disorder


Prediction on Reddit Sekulic ́, Gjurković, Šnajder

Perception Differences between the


Depressed and Non-Depressed Users in
Twitter Park, McDonald, Cha
Predictive linguistic features of Sarioglu Kayi, Diab, Pauselli,
schizophrenia Compton, Coppersmith

Quick and (maybe not so) Easy Detection


of Anorexia in Social Media Posts Mohammadi, Amini, Kosseim

Semi-Supervised Approach to Monitoring Hossein Yazdavar, Al-Olimat,


Clinical Depressive Symptoms in Social Ebrahimi, Bajaj, Banerjee,
Media Thirunarayan, Pathak, Sheth

Using Topic Modeling to Detect and


Describe Self-Injurious and Related
Content on a Large- Scale Digital Platform Franz, Nook, Mair, Nock

Understanding Depressive Symptoms and


Psychosocial Stressors on Twitter: A Mowery, Smith, Cheney, Stoddard,
Corpus-Based Study Coppersmith, Bryan, Conway

"Let Me Tell You About Your Mental


Health!" Contextualized Classification
of Reddit Posts to DSM-5 for Web-based Gaur, Kursuncu, Alambo, Sheth,
Intervention Daniulaityle, Thirunaryan, Pathak

Identifying Depressive Users in Twitter


Using Multimodal Analysis Kang, Yoon, Yi Kim
Inferring Social Media Users' Mental
Health Status from Multimodal
Information Xu, Pérez-Rosas, Mihalcea

eRISK 2017: CLEF Lab on Early Risk


Prediction on the Internet: Experimental
Foundation Losada, Crestani, Parapar

Overview of eRisk: Early Risk Prediction


on the Internet Losada, Crestani, Parapar
Year Platform

2015 Twitter

2014 Twitter

2015 Twitter

2019 Synthetic Crisis Text Conversations


2014 Facebook

2017 Twitter

7 Cups of Tea (Chat-based peer


2018 support platform)

2012 Twitter

2013 Twitter

2014 Facebook

2015 Twitter

2016 Twitter

2014 Twitter

2015 Twitter

2016 Twitter
2017 Twitter

2016 Reddit

2016 Reddit

2017 Reddit

2016 Instagram

2015 Twitter

2013 Facebook

2018 Reddit

2017 Twitter

2018 Reddit
2015 Crisis Text Line

2017 Twitter

2018 SMS

2019 Crisis Text Line

2016 ReachOut (Online Forum)

2018 YouTube

2019 Google Search


2018 Reddit

SMS, emails, and call history, social


media data (i.e., Twitter and
2018 Facebook), web browsing history

2017 Reddit

2017 Twitter

2019 Reddit
2016 Twitter

2019 Instagram

2015 Twitter

2018 Reddit

2018 Reddit

2015 Twitter

2015 Reddit, Wikipedia

2014 Sina Weibo

2015 Sina Weibo


2014 Sina Weibo

2017 Reddit

2019 Twitter

2018 Reddit

2016 Crisis Text Line

2019 Twitter

2019 3 Online mental-health forums

2019 Reddit

2019 Twitter

2019 Reddit

Death Row Last Statements, The


2019 Kernel, Tumblr
2019 Sina Weibo

2017 Twitter

2014 Twitter

2018 Twitter

2018 Twitter, Interview

2018 Twitter

2017 Twitter

2018 Facebook
2018 Facebook

2017 Instagram

2018 Instagram

2017 Instagram

2017 ReachOut (Online Forum)

2018 ReachOut (Online Forum)

2018 Reddit

2018 Reddit
2018 Twitter

2017 Reddit

2018 Reddit

2018 Reddit
2018 Reddit, Online Support Forums

2018 Reddit

2017 Reddit

2017 Reddit
2017 Sina Weibo

2015 Sina Weibo

2014 Sina Weibo, Tencent Weibo, Twitter

2014 Sina Weibo

2013 Sina Weibo


2018 Twitter, Weibo

2015 Tencent Weibo

2016 Tumblr

2015 Tumblr

2017 Tumblr
2017 Twitter

2016 Twitter

2015 Twitter
2013 Twitter

2013 Twitter

2014 Twitter
2017 Twitter

2015 Twitter

2015 Twitter

2014 Twitter

2015 Twitter, Essays


Twitter, Facebook, Ecological
2017 Momentary Assessments

2016 Twitter

2018 Facebook, Twitter

2017 Twitter

2013 Twitter

2017 Twitter
2017 Twitter

TOBYO Toshoshitsu (Disease Survivor


2014 Blogs)

2014 LiveJournal

2017 Flickr
2013 PTT (Bulletin Board System)

2013 Mixi

2019 Twitter, Facebook

2016 Sina Weibo

2013 Sina Weibo


2019 Reddit

2018 Twitter

2019 Reddit

2018 AskFM

2018 Facebook, Twitter


2019 Reddit, Online Support Forums

2017 Twitter

2019 Reddit
2017 Twitter

2019 Twitter

2018 Twitter

2018 Reddit

2013 Twitter
2018 Twitter, Essays

2019 Reddit

2017 Twitter

2019 TeenHelp.org (Forum)

2017 Twitter

2018 Reddit

2019 Twitter
2020 Flickr

2017 Reddit

2018 Reddit
Target Outcomes Labeling Methodology

Regular-expressions (e.g. "I was just


diagnosed with X"); age and gender
matched controls; manual annotation
Depression, PTSD, Control of correctness

Regular-expressions (e.g. "I was just


diagnosed with X"); age and gender
Bipolar Disorder, Depression, PTSD, matched controls; manual annotation
Seasonal Affective Disorder (SAD) of correctness

ADHD, Anxiety, Bipolar Disorder,


Borderline Personality Disorder, Regular-expressions (e.g. "I was just
Depression, Eating, OCD, PTSD, diagnosed with X"); age and gender
Schizophrenia, Seasonal Affective matched controls; manual annotation
Disorder of correctness

None (Message Retrieval Task) Trained counselor role-play


PHQ-9 survey with confirmed clinical
Post Partum Depression diagnoses

Feelings of Depression, Support for


Depression, School or Work-related
Pressures related to Depression,
Substance use to deal with Random sample of Twitter using
depression, self-hard or suicidal keyword matching; manual
thoughts annotation of categories

Survey, Users of the platform


None volunteer their cultural heritage

Depression, Control CES-D Survey

Depression CES-D Survey + BDI

Continuous Depression Score Survey (Big 5 Personality)

Depression CES-D Survey

Depression, PTSD CES-D Survey

PTSD Regular-expressions

Regular-expressions (e.g. "I was just


diagnosed with X"); age and gender
matched controls; manual annotation
Depression, PTSD of correctness
Regular expressions with manual
Suicide Attempt annotation/verification
Regular-expressions (e.g. "I was just
diagnosed with X"); age and gender
matched controls; manual annotation
Depression, PTSD, Control of correctness

Anxiety, Borderline Personality,


Bipolar, Opiate Addiction, Self Hard,
Addiction, Asperger's, Autism,
Alcoholism, Opiate Usage,
Schizophrenia, Self-hard, Suicidal
Ideation Subreddit participation

Depression, Mental Health (General),


Trauma, Bipolar, Borderline
Personality, PTSD, Psychosis, Eating
Disorders, Self Harm, Rape Survivors,
Panic, Social Anxiety, Suicidal Ideation Subreddit participation

14 mental-health related subreddits + Subreddit participation (inductive


small set of control subreddits (e.g. transfer learning applied to users
r/AskReddit) outside the original subreddits)

Topic Modeling + Clinical Annotation


of the Topics; posts sourced using
Eating Disorder keyword seeds

Keyword matching + manual


Suicidal Ideation annotation (3-levels of concern)

Depression CES-D Survey

Subreddit participation (starting a


Depression thread in r/depression)

Neuroatypicality, Suicide Attempt,


Anxiety, Depression, Eating Disorder,
Panic Attacks, Schizophrenia, Bipolar Regular-expressions; age- & gender-
Disorder, PTSD matched controls

Change in Mental Health Disorder Change in participation between


Communication various mental health subreddits
None Platform participation

Regular-expressions (e.g. "I was just


diagnosed with X"); age and gender
matched controls; manual annotation
Depression, PTSD, Control of correctness

Periods of Suicide Attempts, Suicidal


Ideation, Depressive Episodes, Self-identification during in-person
Positive Mood interview

None None
Moderator Annotated (4-levels of
Self-harm risk)
Keyword Video Search, Manual
Suicidal Ideation Comment Annotation

Self-esteem Survey + Rosenberg Self-Esteem Scale


Random sampling and then manual
annotation by expert + non-expert
Suicidal Ideation annotators; 4 levels of risk

Suicidal Ideation Survey + interview

High precision keyword search;


Depression manual annotation

Keyword-based sample;
Suicidal Risk crowdsourced and expert annotations

Regular expression, subreddit


participation, and manual annotation
Suicidal Ideation (4-levels of risk)
Regular expression matching; manual
Depressive Symptoms and Stressors annotation according to DSM-5 and
Associated with Depression DSM-IV

Non-suicidal Self-Injury Pictures of self-harm

Cyberbullying Manual Annotation

ADHD, Anxiety, Autism, Bipolar


Disorder, Depression, Eating Disorder,
Obsessive Compulsive Disorder,
PTSD, Schizophrenia High-precision regular expressions
Random sample with manual
Depression Diagnosis Date annotation

Manual Annotation, hierarchical


schema. Data queried from API using
Major Depressive Disorder keywords.

Suicidal Ideation Participation in r/SuicideWatch

Suicidal Ideation Survey (Suicide Probability Scale)

Suicidal Ideation Survey (Suicide Probability Scale)


Suicidal Ideation Manual Annotation

Subreddit Participation; classes


distinguished based on time period of
posting and movement between
r/SuicideWatch and other mental-
Suicidal Ideation health subreddits

Regular-expressions (e.g. "I was just


diagnosed with X"); age and gender
matched controls; manual annotation
Depression, PTSD, Control of correctness

High precision keyword search;


Depression manual annotation

Survey (interactive text-messaging


Counseling Outcome sessions) regarding success of session
Keyword matching + manual
Suicidal Ideation annotation (3-levels of concern)

Forum participation (Depression,


Bipolar Disorder, Anxiety, Panic
Attacks, ADHD, Borderline Personality
Sentiment Disorder, OCD, PTSD)

High precision keyword search;


Depression manual annotation

Regular-expressions (e.g. "I was just


diagnosed with X"); age and gender
matched controls; manual annotation
Depression, PTSD, Control of correctness
Manual annotation via Mechanical
Stress Turk (Stress, Not Stress, Can't Tell)

Last Row Death Row (Texas


Department of Criminal Justices);
Suicide, Imminent Death, Depression, Known suicide and depression notes
Loneliness on Tumblr
Commenting in "Tree Hole" (a page
on Sina Weibo from an individual who
Suicidal Ideation died by suicide)

Suicidal Ideation Regular expressions

Key phrases/Keywords based on


suicide-related terms. Additional set
of filter criteria to remove
sarcastic/irrelevant material. Two
raters verified random sample of
1,000 tweets (agreement 79.6% of
the time). Of the 1,000 tweets, 789
Suicidal Ideation were found to be relevant.

CLPsych: Regular-expressions (e.g. "I


was just diagnosed with X"); age and
gender matched controls; manual
annotation of correctness

Bell Lets Talk: Hashtag (#BellLetsTalk)


Depression and manual annotation

Interview Corpus: manual annotation


from 3 human judges.
Twitter Corpus: Distant supervision
based on hashtag usage at the end of
Stress a Tweet

Key phrases/Keywords - Unclear how


Suicidal Ideation the negative class is determined

Bell Lets Talk: Hashtag (#BellLetsTalk)


and manual annotation; additional
filtering done to remove promotional
Depression material

International Classification of Disease


Depression (ICD) codes in medical records
Life Satisfaction, Depression Satisfaction with Life Scale, CES-D

Depression CES-D

Depression PHQ-8 survey

Depression, Drug Use, Alcohol Use,


Sleep Disorder, Eating Disorder Hashtags, Sentiment
Manual annotation via forum
Self-harm/Suicidal Ideation moderators

Manual annotation via forum


moderators, used to train a model
Change in Distress (Self-harm/Suicidal that labels severity into Green (safe)
Ideation) and Flagged (Top 3 levels of crisis)

Manual annotation of posts with


r/SuicideWatch, r/Depression,
Suicidal Ideation r/Anxiety, and r/ShowerThoughts

Binary classification task of whether a


Weight loss support vs. Eating- comment was posted In r/Loseit or
disorder encouragement r/proED
Regular expressions to identify
Twitter users with self-disclosed
anxiety; manual annotation of validity
for the users. Tweets annotated as
being anxiety-related based on
keywords (LIWC + embedding
Change in social interaction based on similarity expansion) and model
inferred anxiety trained on r/Anxiety vs. Control data

Classification on a post-level of each


mental health condition. Assignments
based on subreddit. Authors manually
Borderline Personality Disorder, labeled 160 posts from the given 16
Bipolar Disorder, Schizophrenia, mental-health subreddits to verify
Anxiety, Depression, Self-harm, that they consistently matched the
Suicidality, Addiction, Alcoholism, nominal purpose of the given
Opiates, Autism, and Control subreddit

Identified users who posted in anxiety


related subreddits and then selected
control users based on the non-
mental-health-related subreddits that
Anxiety the anxiety-users tended to post in

Classification on a post-level of each


mental health condition. Assignments
based on subreddit. Authors manually
Borderline Personality Disorder, labeled 160 posts from the given 16
Bipolar Disorder, Schizophrenia, mental-health subreddits to verify
Anxiety, Depression, Self-harm, that they consistently matched the
Suicidality, Addiction, Alcoholism, nominal purpose of the given
Opiates, Autism, and Control subreddit
External data set of posts made in an
online depression Forum (Ramirez-
Esparza et al. 2008) and online breast
cancer support forum (Gorbunova
2007); Reddit data set curated using
regular expressions on a post-level
(e.g. I was diagnosed with) in the
r/Depression subreddit; Control users
sampled from breast cancer, family,
Depression, Breast Cancer Support, and relationship support subreddits;
Familiar Support, Relationship No use of manual verification (expect
Support false positives)

Depression users identified from


r/depression based on regular
expressions (disclosures) and then
manually authenticated by authors
(require physician diagnosis). Control
group randomly sampled from non-
depression subreddits and users who
posted in r/depression but didn't
Depression report having depression

Collect posts from r/stress


community (1402) to use as "high
stress" class. 100,000 random posts
sampled by crawling landing page of
Stress Reddit to create "low-stress" class.

Collect posts from anxiety-related


subreddits (r/anxiety, r/panicparty,
r/healthanxiety, r/socialanxiety).
Sampled posts from other first-
person point-of-view subreddits (see
Anxiety paper) to serve as the control group
Web-based survey asked respondents
to fill out Chinese version of Suicide
Probability Scale (SPS), Chinese
version of the DASS-21 was used to
measure the respondents' emotional
distress, and Weibo Suicide
Communication (single question
asking whether a respondent posted
anything in the last 12 months
Suicidal Ideation, Depression, indicating they wanted to kill
Anxiety, and Stress themselves)

Human annotators determined level


of suicidal ideation in each microblog.
3 Levels (suicide warning sign but no
plan, suicide plan but no attempt,
plan and attempt). All levels
transformed into a single binary task:
Suicidal Ideation suicidal ideation or not

Regular expressions find "I feel


stressed" vs. "I feel relaxed" on a
Stress weekly level per individual

Hashtag-based labeling approach,


hashtags fall into 5-categories
(affection, work, social, physiological,
and others). Also crawl tweets with
no stressed hashtags. Randomly
sampled 500 tweets for labeling by 3
people to validate accuracy of distant
Stress supervision (95% accuracy)

Questionnaire-based and interviews


Depression by psychologists
Regular expressions, manual
verification. Control group sampled
randomly. Consider sample of post
history 4-weeks around post used for
Depression labeling

Authors interviewed high school


students and had them manually scan
their own tweets to annotate them
Stress on four stress categories

Snowball sampling to identify eating


disorder-related posts (304 tags,
55,334 posts, 18,923 users); collect
historical data for 13,317 active users;
identified subset of tags related to
recovery and relapse, requiring 5
distinct posts with associated tags to
be assigned to recovery or relapse
group; manual verification (kohen's
kappa .83) for random sample of 150
recovery users by researchers and
Anorexia (Recovery) clinical psychologist

Snowball sampling to identify eating


disorder-related posts (304 tags,
55,334 posts, 18,923 users); separate
pro-recovery and pro-ana
communities based on tag subset
(identified using co-occurrence
methods); sampled 32,000 control
posts using 10 most frequent tags
(e.g. GIF, art, food) meant to
Anorexia (Recovery), Anorexia represent general Tumblr use

Query posts containing "personal",


"lonely", "pathetic", and "sad" tags
(493 total, 459 actually readable);
each post annotated by 4 of the
authors as exhibiting distorted or
Cognitive Distortion undistorted thought patterns
Regular expressions to identify
Twitter users with self-disclosed
schizophrenia (21,254 posts by
15,504 users); randomly sampled 671
users for manual clinical appraisal;
control group was random sample
w/o mentions of schizophrenia or
psychosis; psychiatrist and graduate-
level mental health clinician used
disclosure tweet and +- 10
surrounding tweets to verify
Schizophrenia authenticity

Recruit participants via Mturk;


Participants take DSI-SS (Depressive
Symptom Inventory-Suicide
Subscale). Participants with score > 2
Suicidality labeled as suicidal.

Use 4 suicide related websites


(experienceproject, enotalone,
takethislife, recoveryourlife) and
Tumblr as data for identifying suicide-
related lexicon (via TF-IDF); track 62
resulting keywords (n-grams 1-5) on
Twitter for 6 weeks starting in Feb.
2014; sample 800 tweets containing
keywords and an additional 200
containing names of publicized
suicides; crowdsource 4 annotations
per tweet amongst 7 categories to
distinguish suicide relevance;
removed tweets with < 75%
Suicidal Ideation agreement
Use newspaper birth announcements
to construct lexicon of n-grams likely
to indicate birth; name-based sex
inference identifies females;
Mturkers shown +- 5 posts around
possible announcement to verify
new-mother status (5 ratings per
candidate mother); multiple binary
classification tasks based on extreme
Behavioral Change for New Mothers change for 33 behavioral measures
(re: Postpartum Depression (threshold selected per measure)

CES-D Survey on M-Turk + self-


reported experience with depression;
user-groups based on extreme values
(low + high) of CES-D; user-level
Depression labels propagated to post-level labels

Start with corpus of 2.5 million


tweets (Sadilek et al. (2012)) from
6,237 users in New York City;
Sampled 1,370 tweets from 2000
with highest LIWC "sad" score and an
additional disjoint 630 tweets
matching suicide-risk-factor
keywords; half tweets annotated by
novice and half tweets annotated by
counseling psychologist with 4 labels
(Happy, No distress, low distress, high
distress) based on +- 3 tweets around
Distress Level match
Self-disclosed diagnosis statements
and manual verification; control
group contains age- and gender-
matched controls; Same as "Multi-
Task Learning" (Benton et al.) dataset,
with filtering down to users who post
Suicide Attempt, Schizophrenia, multiple times within a 3-hour
Panic, Eating, Anxiety window

Schizophrenia identified if 2+
conditions hold: self-disclose
diagnosis in user description, self-
disclose in status update, follows
@schizotribe; control set sampled
from 1% random stream and age-
Schizophrenia matched (manually)

Self-disclosed statements identified


using regular expressions; each
disclosure manually-verified by one of
the authors; age- and gender-
matched control group sampled from
Schizophrenia 1% random stream

Spanish and Portuguese regular


expressions to identify tweets likely
to contain each disease; tweets were
manually annotated as positive
(content indicates user has disease),
negative (content indicates user does
not have disease), or undecided
(neither) performed by 2 medical
Depression, Eating Disorders doctors and three engineers

CLPsych 2015 Shared Task Data (e.g.


regex for self-disclosed diagnoses);
Pennebaker and King (1999) steam-
of-consciousness essays where
students write down their thoughts,
sensations, and feelings as they come
Depression to them
Mood and Valence captured using
Photographic Affect Meter in EMAs;
Participants took a battery of
validated questionnaires (Perceived
Stress Scale, Depression Anxiety and
Stress Scale, Flourishing Scale) at start
of study; Supplemental Twitter data
labeled using regular expression
Mood Instability, Bipolar Disorder, disclosures with manual verification
Borderline Personality Disorder of authenticity

Identify candidate users using


followers of popular Borderline
Personality and Bipolar Disorder
accounts, Regular expressions against
Twitter profiles to identify Borderline
Personality and Bipolar Disorder;
Borderline Personality Disorder, random users sampled in equal
Bipolar Disorder quantity to form control group

Participants took PHQ-9 within


Depression mobile app MoodPrism

Regular expressions (strict match) to


identify depressed users, no mention
of "depress" character string to
annotate control group; create
candidate depression group using
Depression loose character match to "depress"

Zung's Self-rating Depression Scale


used to evaluate level of depression
Depression in each individual

Depression users identified using


small set of relevant keywords and
explicit reporting of taking anti-
depression medication; Control group
consists of random sample of users in
Depression the United States
Use regular expressions (eating-
disorder related keywords) within
profiles to identify candidates and
further require that profiles contain
biological information; snowball-
sampling followers of original
candidates used to expand set;
manual labeling of 1000 samples to
quantify precision of labeling process;
2 control groups sampled (1 randomly
and 1 based on followers of popular
music artists with name-inferred
Eating Disorder female sex)

Blog articles classified by symptoms


(e.g. depression, breast cancer);
Depression group randomly selected
based on depression tag, non-
depression selected randomly
Depression amongst remaining group

Identified mental-health communities


(e.g. adult_bipolar, alonedepressed)
and general communities (e.g.
curlyhair, cat_lovers) to serve as
Depression, Self-Harm, Suicide, comparison groups CLINICAL vs.
Bipolar Disorder, Grief CONTROL

Use "selfharm" and "selfinjury" tags


to seed search of Flickr for additional
high-precision relevant tags; identify
15 additional tags to identify
additional candidates; remove users
in candidate pool who use self-harm
tags in less than 5 posts; control
group sampled from YFCC dataset
and confirmed not to use any self-har
tags; researchers manually verified
Self-harm subset of self-harm posts
Users of the "Prozac" message board
labeled as depressed, while those on
the "Sad" message board labeled as
having ordinary sadness; Also look at
two happiness message boards
"gossiping" and "happy" to represent
messages with non-negative
Depression emotions

Users of 4 suicide-related forums and


10 depression-related forums on the
platform considered as part of the
"suicide" and "depression" groups
respectively group, while users who
did not participate in the forums but
were active were considered
Suicidal Ideation "control"

Consider 4 types of labels: Affiliation


Data (based on following
Schizophrenia and Related Disorders
Alliance of America account), Self-
Report (e.g. regular expressions);
Clinically-appraised Self-report (e.g.
self-report + clinician assessment of
history), Schizophrenic Patients in IRB
Study. Control data sampled to match
based on data characteristics (e.g.
Schizophrenia language, followers, friends)

Used word-embeddings to identify


terms likely to represent stressors
and stress subjects; manually
annotated 2,000 posts with stressor
and subject and an additional 600
Stress, Stress (Stressor and Stress posts considered as non-stress
Subject) related

Users detected by group of


psychologists with traditional
diagnosis criteria through
Depression questionnaires and surveys
Regular expression, subreddit
participation, and manual annotation
Suicidal Ideation (4-levels of risk)

Queried 2000 tweets, retweets, and


mentions of @TyquanAssassin;
manually filtered out tweets that did
not specifically reference 2 associated
deaths of the case study; manual
Grief, Aggression coding of all remaining tweets

Sample 53 posts from eating disorder


subreddits for manual binary
annotation (positive for immense risk,
negative otherwise); 6,000 additional
posts sampled as being the "hottest"
submissions on EatingDisorders,
BingeEatingDisorder,
eating_disorders, bulimia, proED, and
fuckeatingdisorders; Coders manually
evaluated top 50 (114 unique) most
likely risky posts as labeled by 5
Eating Disorder classifiers

Annotators were provided detailed


schematic for labelling; 2-levels
(relation to cyberbullying and then
type of cyberbullying); 7-types of
Cyberbullying cyber bullying

All comments manually labeled with


3-levels of aggression: overtly
aggressive, covertly aggressive, and
Aggression non-aggressive
External data set of posts made in an
online depression Forum (Ramirez-
Esparza et al. 2008) and online breast
cancer support forum (Gorbunova
2007); Reddit data set curated using
regular expressions on a post-level
(e.g. I was diagnosed with) in the
r/Depression subreddit; Control users
sampled from breast cancer, family,
and relationship support subreddits;
No use of manual verification (expect
Depression false positives)

Query Twitter streaming API using


keywords from APA's list of risk
factors and AAS's list of warning signs
related to suicide; identify 60
distressed users amongst sample that
frequently discuss depression,
suicide, and self-mutilation (in
addition to 60 other random users);
remove 500/5,446 tweets for manual
labeling (no distress, minimal distress,
Suicidal Ideation moderate distress, severe distress)

Task 1 (Anorexia): Regular-


expressions identify diagnosis
disclosure; Task 2 (Depression): RSDD
Dataset, regular expressions; Task 3
(Level of Depression): Beck's
Depression Inventory Questionnaire
Anorexia, Depression BDI
Identified tweets using the
#MyDepressionLooksLike hashtag;
filter down to original tweets from
human authors (e.g. no PSAs or
spam); each tweet manually coded by
2 annotators for theme
(Dysfunctional thoughts, Lifestyle
challenges, social struggles, hiding
behind a mask, apathy and sadness,
Depression and relief seeking)

Search for tweets using phrases e.g. "I


do not care about the law", "I wish
you die soon", and "Go to hell";
Manually annotated all tweets as
conveying antisocial behavior or not;
psychology graduate student verified
Antisocial Behavior annotations

Data from OurDataHelps.org (social


media data + history of mental
health); regular-expressions to
identify past suicide attempts +
manual verification; age- and gender-
Suicide Attempt matched controls

Regular expressions and flair within


bipolar disorder subreddits; control
groups sampled from over-indexing
subreddits (non-bipolar); filter out
users with less than 1000 words; filter
Bipolar Disorder out posts mentioning bipolar disorder

CES-D identified individuals with


depression; interviews of participants
coded manually by authors for
Depression qualitative analysis
Essays come from patients with
diagnosed Schizophrenia (and health
controls); Twitter data comes from
users with self-disclosed diagnoses
(e.g. regular-expressions) and age-
Schizophrenia and gender-matched controls

Regular-expressions identify diagnosis


disclosure; controls mentioned
anorexia or participated in discussion
Anorexia but did not have diagnosis

Regular-expression matching against


Twitter user profiles (phrases highly
indicative of disease); filter out users
with less than 100 tweets (no manual
Depression verification of authenticity)

Collect top-level posts from "Self-


harm", Depression and Suicide", and
"Friends and Family" subforums;
restrict to users with self-identified
age under 25 and gender being male
or female; human annotators (3)
coded each of posts based on 11
Self-harm topics

Leverage annotation schema from


prior work to label each tweet with
symptomology + relevance to
Depression (Symptoms) depression

Anxiety, Borderline Personality,


Bipolar, Opiate Addiction, Self Hard,
Addiction, Asperger's, Autism,
Alcoholism, Opiate Usage,
Schizophrenia, Self-hard, Suicidal
Ideation Subreddit participation

Depression CES-D
Mental Health (General) Tag-based proxy labelling

Regular expressions used to identify


individuals with diagnosis; control
groups include users who often post
about the disorder but do not have it
(e.g. support a peer or family
Depression member)

Regular expressions used to identify


individuals with diagnosis; control
groups include users who often post
about the disorder but do not have it
(e.g. support a peer or family
Depression, Anorexia member)
Size Availability

Train (326 depressed, 246 PTSD, 573


control); Test (150 depressed, 150
PTSD, 300 control) Available via Signed Agreement

Bipolar: 394 individuals (992k tweets)


Depression: 441 individuals (1.0M
tweets)
PTSD: 244 individuals (573k tweets)
SAD: 159 individuals (421k tweets)
Control: 5728 individuals (13.7M
tweets) Available via Signed Agreement

ADHD: 102 individuals (384k tweets)


Anxiety: 216 individuals (1.591M
tweets)
Bipolar: 188 individuals (730k tweets)
Borderline Personality: 101
individuals (321k tweets)
Depression: 393 individuals (546k
tweets)
Eating: 238 individuals (724k tweets)
OCD: 100 individuals (314k tweets)
PTSD: 403 individuals (1.251M
tweets)
Schizophrenia: 172 individuals (493k
tweets)
Seasonal Affective: 100 individuals
(340k tweets) Available via Signed Agreement

253 conversations; 9,062 visitor


messages; 5,320 counselor messages;
2,999 counselor paraphrases
165 individuals (137 w/o PPD, 28 w/
PPD); 578,200 data points (wall posts,
videos, photos, links, and check ins);
separated by pre and post-natal
period

2,000 tweets

1,593 individuals

69 individuals (41 low-mild


depression, 28 high depression);
5,706 tweets Not Available

476 individuals

28,479 individuals Not Available (MyPersonality Dataset)

208 individuals (81 depressed)


Depression: 105 individuals
PTSD: 63 individuals Not Available

PTSD: 244 individuals


Control: 6,100 individuals Available via Signed Agreement

Bipolar: 394 individuals (992k tweets)


Depression: 441 individuals (1.0M
tweets)
PTSD: 244 individuals (573k tweets)
SAD: 159 individuals (421k tweets)
Control: 5728 individuals (13.7M
tweets) Available via Signed Agreement
250 individuals (125 w/ past suicide
attempt, 125 control) Available via Signed Agreement
Train (326 depressed, 246 PTSD, 573
control); Test (150 depressed, 150
PTSD, 300 control) Available via Signed Agreement

All individuals who posted in


manually identified subreddits
associated with each of the mental-
health disorders Reproducible via API

880 individuals (random sample, even


split between general mental health
and r/SuicideWatch) Reproducible via API

21,734 mental-health comments +


21,734 control comments (~15k
individuals each) Reproducible via API

26M posts; 100k individuals

2000 posts (14% high, 56% possibly,


29% safe) Not Available

55 users

12,106 individuals total (4,947


depressed) Reproducible via API

9,611 individuals with an average


3,521 tweets per individual Available via Signed Agreement

641 individuals in increase group, 368


individuals in no change group, 758
individuals in decrease group. Reproducible via API
469,849 counselor messages and
412,050 caller messages CTL Research Fellows Only

Train (326 depressed, 246 PTSD, 573


control); Test (150 depressed, 150
PTSD, 300 control) Available via Signed Agreement

33 individuals (189,478 messages) Available Pending Future Exploration

469,849 counselor messages and


412,050 caller messages CTL Research Fellows Only
65,024 forum posts (of which only
1,227 have labels) Available via Signed Agreement

5,051 comments

108 individuals (2-months of logs)


934 individuals (sourced from
r/SuicideWatch) + Equal number of
control individuals Available via Signed Agreement

33 individuals will multiple platforms


(26 with SMS); > 1M incoming and
outgoing messages Available Pending Future Exploration

9,210 diagnosed individuals, 107,274


control individuals (control
individuals based on distance of
subreddit probability distributions) Available via Signed Agreement

2,000 tweets

31,554 posts from 496 users. Based


on "Expert, Crowdsourced, and
Machine Assessment of Suicide Risk
via Online Postings". Tasks based on
which subreddits the model has
access to data from. Available via Signed Agreement
9,473 annotations for 9,300 tweets (9
depressive stressors and 12
psychosocial stressors) Available

52 individuals (data restricted) Not Available

7,321 tweets

ADHD: 10,098 individuals


Anxiety: 8,783 individuals
Autism: 2,911 individuals
Bipolar: 6,434 individuals
Depression: 14,139 individuals
Eating: 598 individuals
OCD: 2,336 individuals
PTSD: 2,894 individuals
Schizophrenia: 1,331 individuals Available via Signed Agreement

598 Comments Available via Signed Agreement

129 Tweets Available

66,059 posts from 19,159 individuals Available

1,038 individuals (Up to 2,000


messages)

909 individuals
1,053 individuals (6,754 posts)

440 individuals MH to SW (62,024


comments of support from 32,362
unique users);
440 individuals MH (41,894
comments of support from 21,358
unique users) Reproducible via API

Train (326 depressed, 246 PTSD, 573


control); Test (150 depressed, 150
PTSD, 300 control) Available via Signed Agreement

9,210 diagnosed individuals, 107,274


control individuals (control
individuals based on distance of
subreddit probability distributions) Available via Signed Agreement

408 counselors; 3.2 million messages;


80,885 conversations Available
2,000 posts (14% high, 56% possibly,
29% safe) Not Available

49,113 threads; 500,754 posts;


75,000 individuals

9,210 diagnosed individuals, 107,274


control individuals (control
individuals based on distance of
subreddit probability distributions) Available via Signed Agreement

Train (326 depressed, 246 PTSD, 573


control); Test (150 depressed, 150
PTSD, 300 control) Available via Signed Agreement
3,554 labeled data points for 2,929
posts Available

Last Statements: 431 notes


Suicide Notes: 161 notes
Depression Notes: 142 notes
7,329 individuals

Candidate disclosures: 51,038,914


posts from 470,337 individuals
Control: 66,214,850 posts from
480,685 individuals

733,011 tweets from 594,776


individuals Reproducible via API

CLPsych: Train (326 depressed, 246


PTSD, 573 control); Test (150
depressed, 150 PTSD, 300 control)

Bell Lets Talk: 154 individuals (53


depressed) Available via Signed Agreement

Natural Stress Emotion Corpus: 38


student interviews with 2,272 binary-
labeled utterances
Stress Twitter Corpus: 367,312
Tweets (59,768 stressed)

Reproducible via API (but lacking


Not stated clear instructions)

154 individuals (53 depressed)

683 patients (114 depressed) Not available (PHI restrictions)


SWLS: 1298 SWLS <25, 785 SWLS >=
25
CES-D: 148 CES-D < 20; 466 >= 20 Not Available (MyPersonality Dataset)

166 individuals (71 of whom had a Not Available (confidentiality


history of depression) restrictions)

749 individuals Not Available (protocol restrictions)

Posts Per Class


-- Depression: 18,203
-- Drug Use: 138,021
-- Drinking: 4,979
-- Sleep Disorder: 4,758
-- Eating Disorder: 234
No Control, Just Choosing Which
Disorder
1188 labeled posts (40 crisis, 137 red,
296 amber, 715 green) Available via Signed Agreement

1,040 threads Available via Signed Agreement

785 posts

r/loseit: 2.3 million comments in 164k


posts
r/proED: 123k comments from 8.5k
posts Reproducible via API
200 individuals (209,290 tweets) Reproducible via API

# of Comments: Borderline (11,880),


Bipolar (41,636), Schizophrenia
(4,963), Anxiety (57,523), Depression
(197,436), Self-harm (17,102),
Suicidality (90,518), Addiction (4,360),
Opiates (65,143), Autism (9,470),
Control (476,388) Reproducible via API

1,569 documents (523 anxiety


concatenated histories, 523 anxiety
concatenated histories from non-
anxiety forums, 523 comparison
concatenated histories from
members not in non-anxiety forums) Reproducible via API

# of Comments: Borderline (11,880),


Bipolar (41,636), Schizophrenia
(4,963), Anxiety (57,523), Depression
(197,436), Self-harm (17,102),
Suicidality (90,518), Addiction (4,360),
Opiates (65,143), Autism (9,470),
Control (476,388) Reproducible via API
400 posts per data set group Reproducible via API

Up to 2000 posts per individual. Final


data set had 531,453 submissions
from 892 users (125 depressed).

Randomly balanced classes, sampling


~2000 posts total for their data set Reproducible via API

Anxiety (9971 posts), Control (12,837


posts) Reproducible via API
974 respondents. 117 has WSC, 190
high suicide risk, 49 severe
depression, 140 severe anxiety, 45
severe stress

7314 posts (664 suicide)

492676 posts (239038 stressed).


23304 individuals (11074 stressed) Reproducible via API

57785 tweets (14931 not stressed)

122 depressed, 346 non-depressed


individuals (6013 posts total)
Weibo: 580 depressed, 580 control
Twitter (Shen 2017): 1394 depressed,
1394 control

36 individuals (21,648 tweets)

13,317 users (2,353 recovery, 10,964


non-recovery); 68 MM posts (25MM
recovery, 42MM non-recovery). Posts
shared between 2/20/2007 and
8/4/2014 Reproducible via API

Anorexia: 55,334 posts, 18,293 users


(11,301 pro-recovery, 44,033 pro-
ana); Control: 32,000 posts Reproducible via API

459 posts (206 distorted, 252


undistorted)
Schizophrenia Authenticity (146 yes,
101 maybe, 424 no Users) with
(1.9M, 1.5M, and 8.8M tweets,
respectively); Additional sample of
100 users (18 authenticated positive
by experts)

135 Participants (17 Suicidal)

816 Tweets (13% evidence of possible


suicidal ideation)
376 validated new mothers (36,948
posts prenatal, 40,426 post-natal)

Depression (117 users, 23,984 users);


Control (157 users, 45,530 posts) Not Available

2000 tweets
Anxiety (2,408 users), eating disorder
(749 users), panic attacks (263 users),
schizophrenia (350 users), and suicide
attempt (424 users)

Schizophrenia (96 users), Control (200


users)

Schizophrenia (174 users), Control


(174 users)

Depression (Spanish: 3,253 tweets


[160 positive], Portuguese: 2,846
[120 positive] tweets); Eating
Disorders (Spanish: 412 [111 positive]
tweets, Portuguese: 468 [87 positive]
tweets)

CLPsych Data (~600 depression +


~600 age- & gender- matched
controls); Steam-of-consciousness
essays (6,459 individuals)
EMA (51 participants, 1,606
responses), Facebook CL Study (23
participants, 13,340 status updates),
Twitter CL (10 participants, 1425
tweets); Twitter Bipolar (6,326 users,
14M tweets); Twitter Borderline
(3,238 users 7M tweets); Twitter
Control (9,394 users, 15M tweets)

Bipolar Disorder (278 users),


Borderline Personality Disorder (203
users), Control (548 users)

Facebook (538 status updates, 29


users), Twitter (1,318 posts, 49 users)

Depression: 1402 users (292,564


tweets)
Control: >300M users (>10 billion
tweets)
Candidate Depression: 36,993 users
(35M tweets) Freely available for download

50 participants

Depression (50 users); Control (100


users)
Eating disorder (3,380 users),
Random Control (30,684 users),
Young + Female Control (37,983
users)

Depression (100 authors), Non-


Depression (100 authors)

Clinical (38,401 posts from 24


communities), Control (229,563 posts
from 23 communities)

Self-harm (20,495 users, 93,286


posts), Control (19,720 users, 93,286
randomly sampled posts)
Gossiping (1,699 users, 6,505 posts),
Happy (2,695 users, 11,209 posts),
Prozac (1,027 users, 6,015 posts), Sad
(1,652 users, 4,900 posts)

Suicide (9,990 users), Depression


(24,410 users), Control Group
(228,949 users)

Affiliation: 1847 users


Self-report: 412 users
Clinically-appraised Self-report: 153
users
Patients: 88 patients
Control: Equal number for each type

Stress (2,000 posts), Control (600


posts) Available

Depression (90 users), Non-


depression (90 users)
31,554 posts from 496 users. Based
on "Expert, Crowdsourced, and
Machine Assessment of Suicide Risk
via Online Postings". Tasks based on
which subreddits the model has
access to data from. Available via Signed Agreement

General (718 tweets)

Positive (38 posts), Negative (15


posts), Unlabeled (6,000)

English (113,698 posts [5,375


cyberbullying]), Dutch (78,387 posts,
[5,106 cyberbullying]) Available via Signed Agreement

Facebook (15,000 comments), Twitter


(1,257 English + 1,194 Hindi Tweets) Available
Depression (1,293 posts), Non-
depression (548 posts) Reproducible via API

Depression (60 users, 2,381 tweets),


Non-depression (60 users, 3,065
tweets)

Task 1: Anorexia (61 users, 24,874


posts), Non-anorexia (411 users,
228,878 posts); Task 2: Depression
(9,210 users), Non-depression
(107,274 users); Task 3: Level of
Depression (20 users) Available via Signed Agreement
1,978 tweets

55,810 tweets

Suicide Attempt (418 users, 197,615


posts), Controls (418 users, 197,615
posts)

Bipolar Disorder (3,488 users);


Control (3,931 users) Reproducible via API

Depression (7 participants), Non-


depression (7 participants) Not Available
Essays: Schizophrenia (93 patients),
Control (95 patients); Twitter:
Schizophrenia (174 users), Control
(174 users)

Anorexia (61 users, 24,874 posts),


Non-anorexia (411 users, 228,878
posts) Available via Signed Agreement

Depression (2,000 users), Control


(2,000 users)

2,359 posts

9,300 tweets

All individuals who posted in


manually identified subreddits
associated with each of the mental-
health disorders Reproducible via API

45 individuals Not Available


Mental Illness (770 users, 14,781
posts); Pre-mental Illness (658 users,
11,828 posts); Health Users (15,000
users, 15,000 posts) Reproducible via API

Depression (135 users, 49,557 posts);


Control (752 users, 481,337 posts) Available via Signed Agreement

Task 1: Depression (214 users, 90,222


posts), Control (831 users, 0.9M
posts); Task 2: Anorexia (61 users,
24,874 posts); Control (411 users,
228,878 posts) Available via Signed Agreement
Additional Comments Dataset Link (if any)
Japanese Language Only
Korean Language Only
Chinese Language Only
http://ir.cs.georgetown.edu/resource
s/rsdd.html
https://research.cs.wisc.edu/bullying/
data.html

http://ir.cs.georgetown.edu/resource
s/smhd.html
http://ir.cs.georgetown.edu/resource
s/

Pilot Study

Chinese Language Only

Chinese Language Only


Chinese Language Only

http://ir.cs.georgetown.edu/resource
s/rsdd.html

http://snap.stanford.edu/counseling/

http://ir.cs.georgetown.edu/resource
s/rsdd.html

http://www.cs.columbia.edu/~eturca
n/data/dreaddit.zip
Chinese Language Only

Not a predictive task, but rather a


tracking study.

Bell Lets Talk dataset detailed in Jamil


et al., 2017 (Masters Thesis)
CLPsych 2016 dataset

CLPsych 2016 dataset


Same dataset as "The language of
mental health problems in social
media" (2016)

Same dataset as "The language of


mental health problems in social
media" (2016)
Leverages external data sets from
Ramirez-Esparza 2008 and Gorunova
2007 (online support forums)

eRISK 2017 Dataset

Also identified data in college


subreddits (pre- and post- campus
shooting incidents), but it doesn't
have any ground truth associated
with it.
Chinese Language Only

Chinese Language Only

Chinese Language Only; also


examined performance on 3 other
datasets (construction unclear) on
Sina Weibo, Tencent Weibo, and
Twitter

Chinese Language; Also includes


images

Chinese Language Only


Uses Shen et al. 2017 for Twitter data
set

English only
Spanish, Portuguese language from
Spain and Portugal, respectively

Leverage existing CLPsych 2015


dataset for depression classification
CL: CampusLife Study at Georgia Tech

http://depressiondetection.droppage
s.com

Japanese Language Only


Japanese Language Only
Chinese Language

Japanese Language Only

English http://stressmeasure.droppages.com

Data comes from "A depression


detection model based on sentiment
analysis in micro-blog social
network"; Chinese language only
CLPsych 2019 dataset

https://osf.io/rgqw8/

Subset of Kumar et al. (2018); English


+ Hindi versions http://trac1-dataset.kmiagra.org/
Leverages data from Pirina et al.
(2018)

Task 1 comes from eRisk 2018, Task 2


comes from RSDD (Georgetown),
Task 3 is new data
eRisk 2019 Dataset

Builds upon Mowery et al. 2016

From Gkotsis et al. (2016)

Korean Language Only


Dataset originally proposed in Losada
et al. (2016) "A test collection for
research on depression and language
use"
Reference Link

https://www.aclweb.org/anthology/
W15-1204/

https://www.aclweb.org/anthology/
W14-3207/

https://www.aclweb.org/anthology/
W15-1201/

https://www.aclweb.org/anthology/
W19-3001/
http://www.munmund.net/pubs/csc
w_14_1.pdf

https://www.sciencedirect.com/scien
ce/article/pii/S0747563215300996

https://pdfs.semanticscholar.org/e88
6/3d0ace1ad50f2fd9bc64ea953df827
1a60c1.pdf

https://pdfs.semanticscholar.org/8dd
5/8913bd343f4ef23b8437b24e152d3
270cdaf.pdf

https://www.aaai.org/ocs/index.php/
ICWSM/ICWSM13/paper/viewFile/61
24/6351

https://www.aclweb.org/anthology/
W14-3214/
https://dl.acm.org/citation.cfm?
id=2702280
https://www.nature.com/articles/s41
598-017-12961-9

https://www.aaai.org/ocs/index.php/
ICWSM/ICWSM14/paper/viewFile/80
79/8082

https://www.aclweb.org/anthology/
W15-1203/
https://www.aclweb.org/anthology/
W16-0311/
https://arxiv.org/pdf/1607.07384.pdf

https://www.aclweb.org/anthology/
W16-0307/

https://www.ncbi.nlm.nih.gov/pmc/a
rticles/PMC5659860/

https://www.ncbi.nlm.nih.gov/pmc/a
rticles/PMC5565736/

http://www.munmund.net/pubs/csc
w16_MIS.pdf

https://www.sciencedirect.com/scien
ce/article/pii/S2214782915000160

https://www.jmir.org/2013/10/e217/

https://www.aclweb.org/anthology/
W18-4102/

https://arxiv.org/abs/1712.03538

http://lit.eecs.umich.edu/files/LiMiha
lceaWilson_Socinfo_18.pdf
https://dspace.mit.edu/bitstream/ha
ndle/1721.1/110590/Picard_Mixed-
initiative.pdf?
sequence=1&isAllowed=y

https://arxiv.org/abs/1705.00335

https://osf.io/6r2nq

https://www.jmir.org/2019/1/e11507
/
https://www.aclweb.org/anthology/
W16-0312/
https://link.springer.com/chapter/10.
1007/978-3-030-02686-8_30

https://www.cs.rochester.edu/u/kaut
z/papers/www_2019_detecting_low_
selfesteem_4089820.pdf
https://s3.amazonaws.com/academia
.edu.documents/56757600/W18-
0603.pdf?response-content-
disposition=inline%3B%20filename
%3DExpert_Crowdsourced_and_Mac
hine_Assessme.pdf&X-Amz-
Algorithm=AWS4-HMAC-SHA256&X-
Amz-
Credential=AKIAIWOWYYGZ2Y53UL3
A%2F20191108%2Fus-east-
1%2Fs3%2Faws4_request&X-Amz-
Date=20191108T130308Z&X-Amz-
Expires=3600&X-Amz-
SignedHeaders=host&X-Amz-
Signature=f6df0e59e77b9b066b854b
e2d29b7756d6cc6e8c4dbaecdb41ac0
61816374aef

https://www.ncbi.nlm.nih.gov/pmc/a
rticles/PMC6442737/

https://arxiv.org/pdf/1709.01848.pdf

https://arxiv.org/abs/1701.08796

https://www.aclweb.org/anthology/
W19-3003/
https://www.researchgate.net/profile
/Marina_Litvak/publication/3125912
27_Social_and_linguistic_behavior_a
nd_its_correlation_to_trait_empathy
/links/58877a724585150dde501df8/S
ocial-and-linguistic-behavior-and-its-
correlation-to-trait-
empathy.pdf#page=196

https://www.ncbi.nlm.nih.gov/pmc/a
rticles/PMC6736249/
http://pages.cs.wisc.edu/~jerryzhu/p
ub/junming-thesis.pdf

https://arxiv.org/pdf/1806.05258.pdf
http://ir.cs.georgetown.edu/downloa
ds/macavaney-clpsych2018.pdf

https://www.aclweb.org/anthology/
W15-1211/

https://www.ncbi.nlm.nih.gov/pmc/a
rticles/PMC5507358/

https://link.springer.com/chapter/10.
1007/978-3-319-15554-8_45

https://www.ncbi.nlm.nih.gov/pubm
ed/26543921
https://arxiv.org/abs/1411.0778

https://www.microsoft.com/en-
us/research/publication/language-
social-support-social-media-effect-
suicidal-ideation-risk/

https://www.aclweb.org/anthology/
W19-3013/

https://www.aclweb.org/anthology/Y
18-1070/

http://timalthoff.com/docs/althoff-
2016-mental_health.pdf
https://www.aclweb.org/anthology/U
16-1010/

https://aisel.aisnet.org/cgi/viewconte
nt.cgi?article=1269&context=wi2019

https://www.aclweb.org/anthology/D
19-5542/

https://www.aclweb.org/anthology/D
19-6208/
https://www.aclweb.org/anthology/D
19-6213/

https://www.aclweb.org/anthology/D
19-6217/
https://www.aclweb.org/anthology/D
19-1181/

https://dl.acm.org/citation.cfm?
id=2998220

https://econtent.hogrefe.com/doi/ab
s/10.1027/0227-5910/a000234?
journalCode=cri

https://www.aclweb.org/anthology/
W18-0609

https://ieeexplore.ieee.org/abstract/
document/8461990

https://ieeexplore.ieee.org/abstract/
document/8554733/

https://paulallen.ca/docs/Jamil,%20Z
%20Monitoring%20tweets%20for
%20depression%20to%20detect
%20at-risk%20users%20-
%202017.pdf
https://www.pnas.org/content/115/4
4/11203?
utm_source=yxnews&utm_medium=
mobile
https://ieeexplore.ieee.org/abstract/
document/8419355

https://epjdatascience.springeropen.
com/articles/10.1140/epjds/s13688-
017-0110-z

https://www.jmir.org/2018/12/e1181
7/

https://link.springer.com/chapter/10.
1007/978-3-319-67256-4_7
https://asistdl.onlinelibrary.wiley.com
/doi/full/10.1002/asi.23865

https://arxiv.org/abs/1804.07253

https://www.jmir.org/2018/6/e215/

https://dl.acm.org/doi/abs/10.1145/3
173574.3174240
http://www.munmund.net/pubs/Anxi
ety_SocialInt_ICWSM18.pdf

https://www.nature.com/articles/sre
p45141

https://www.aclweb.org/anthology/
W18-0620/

https://www.aclweb.org/anthology/
W18-0607/
https://www.aclweb.org/anthology/
W18-5903/

https://dl.acm.org/doi/abs/10.1145/3
159652.3159725

https://dl.acm.org/doi/abs/10.1145/3
134727

https://www.aclweb.org/anthology/
W17-3107/
https://www.jmir.org/2017/7/e243/

https://www.aclweb.org/anthology/Y
15-1064/

https://dl.acm.org/doi/abs/10.1145/2
647868.2654945

https://ieeexplore.ieee.org/abstract/
document/6890213

https://link.springer.com/chapter/10.
1007/978-3-642-40319-4_18
https://eprints.soton.ac.uk/423226/

https://link.springer.com/chapter/10.
1007/978-3-319-25261-2_3

https://dl.acm.org/doi/abs/10.1145/2
858036.2858246

https://dl.acm.org/doi/abs/10.1145/2
750511.2750515

https://ieeexplore.ieee.org/abstract/
document/8031202
https://www.jmir.org/2017/8/e289/

https://mental.jmir.org/2016/2/e21/
?
utm_source=TrendMD&utm_medium
=cpc&utm_campaign=JMIR_TrendMD
_1

https://dl.acm.org/doi/abs/10.1145/2
700171.2791023
https://dl.acm.org/doi/abs/10.1145/2
470654.2466447

https://dl.acm.org/doi/abs/10.1145/2
464464.2464480

https://www.aclweb.org/anthology/
W14-3213/
https://www.aclweb.org/anthology/
W17-3110/

https://www.ncbi.nlm.nih.gov/pmc/a
rticles/PMC4525233/

https://www.aclweb.org/anthology/
W15-1202/

https://journals.plos.org/plosone/arti
cle?
id=10.1371/journal.pone.0086191

https://www.aclweb.org/anthology/
W15-1212/
https://dl.acm.org/doi/abs/10.1145/3
130960

https://ieeexplore.ieee.org/abstract/
document/7752434

https://www.jmir.org/2018/5/e168/

https://nextcenter.org/wp-
content/uploads/2018/02/Depressio
n-Detection-via-Harvesting-Social-
Media-A-Multimodal-Dictionary-
Learning-Solution.pdf

https://ieeexplore.ieee.org/documen
t/6549431

https://dl.acm.org/doi/abs/10.1145/3
079452.3079465
https://dl.acm.org/doi/abs/10.1145/3
018661.3018706

https://www.aaai.org/ocs/index.php/
SSS/SSS14/paper/viewFile/7744/7782

https://ieeexplore.ieee.org/abstract/
document/6784326

https://dl.acm.org/doi/abs/10.1145/3
038912.3052555
https://link.springer.com/chapter/10.
1007/978-3-642-37453-1_23

https://journals.plos.org/plosone/arti
cle?
id=10.1371/journal.pone.0062262

https://dl.acm.org/doi/abs/10.1145/3
290605.3300364

https://hcsi.cs.tsinghua.edu.cn/Paper
/Paper16/IJCAI-linhuijie.pdf

https://ieeexplore.ieee.org/abstract/
document/6753906
https://www.aclweb.org/anthology/
W19-3005/

https://journals.sagepub.com/doi/full
/10.1177/1178222618763155

https://onlinelibrary.wiley.com/doi/f
ull/10.1002/eat.23148

https://journals.plos.org/plosone/arti
cle?
id=10.1371/journal.pone.0203794

https://www.aclweb.org/anthology/
W18-4401/
https://ieeexplore.ieee.org/abstract/
document/8681445

https://ieeexplore.ieee.org/abstract/
document/8269767

http://ceur-ws.org/Vol-
2380/paper_66.pdf
https://mental.jmir.org/2017/4/e43/

https://link.springer.com/chapter/10.
1007/978-3-030-22354-0_43

https://journals.sagepub.com/doi/full
/10.1177/1178222618792860

https://arxiv.org/abs/1811.04655

https://citeseerx.ist.psu.edu/viewdoc
/download?
doi=10.1.1.941.6087&rep=rep1&type
=pdf
https://arxiv.org/abs/1810.09377

http://ceur-ws.org/Vol-
2380/paper_74.pdf

https://dl.acm.org/doi/abs/10.1145/3
110025.3123028

https://onlinelibrary.wiley.com/doi/f
ull/10.1111/sltb.12569

https://www.jmir.org/2017/2/e48/

https://dl.acm.org/doi/abs/10.1145/3
269206.3271732

https://info.computer.org/csdl/proce
edings-
article/bigcomp/2016/07425918/12O
mNAkEU5D
https://www.aclweb.org/anthology/2
020.lrec-1.772/

https://link.springer.com/chapter/10.
1007/978-3-319-65813-1_30

https://tec.citius.usc.es/ir/pdf/eRisk2
018LNCS.pdf

You might also like