You are on page 1of 4

Scalable Mental Health Analysis in the Clinical Whitespace via Natural

Language Processing
Glen Coppersmith1 , Casey Hilland1 , Ophir Frieder2 , Ryan Leary1

Abstract— Our increasingly digital life provides a wealth of Fig. 1. An example person’s interactions with the health care system (red
data about our behavior, beliefs, mood, and well-being. This hashes), and Facebook posts (blue) over four years (x-axis). This user is
data provides some insight into the lives of patients outside the from the University of Pennsylvania’s data set linking social media data
healthcare setting, and in aggregate can be insightful for the and medical records [6], [7]. This paper discusses some of the scientific
advances that may provide clinically useful information from the analysis
person’s mental health and emotional crisis. Here, we introduce of digital life data (e.g., Facebook) from the whitespace between health care
this community to some of the recent advancement in using visits.
natural language processing and machine learning to provide
insight into mental health of both individuals and populations.
We advocate using these linguistic signals as a supplement to
those that are collected in the health care system, filling in some
of the so-called “whitespace” between visits.

I. INTRODUCTION
Mental illness causes a tremendous strain on the human
race, emotionally, cognitively, and financially – and we un- Whitespace information provides a lens through which
derstand relatively little about it compared to many physical we can analyze psychological phenomena like emotional
illnesses and ailments. $3.5 trillion is estimated global cost crisis, suicide attempts, and drug relapse. This is particularly
of mental illness [1]. 1 in 5 Americans (19%) experience powerful for mental health, which is, by definition, when the
a mental illness each year [2], between 4% and 8% have user interacts with the rest of the world, and thus notoriously
thoughts of suicide and 1% die by suicide; 42000 died by difficult to assess in a lab or formal health care setting. In
suicide last year in the United States [3], [4]. In the U.S., an particular, linguistic signals from the whitespace are powerful
estimated 3500 adults aged 35-64 attempt suicide per month. because they are objectively recorded in the moment in a
This represents a 28+% increase since 1999 and reaches manner conducive to automated analysis, rather than recalled
across all geographic regions of the United States [5]. (with imprecision and bias) after the event. Encoded in
At the crux of many of these issues is emotional crisis these signals are directly observable psychologically relevant
– e.g., a suicide attempt or the loss of sobriety. These are phenomena (e.g., social media interactions with friends),
behaviors that have a high emotional and financial toll, indirectly observable phenomena (e.g., social events that
and are ill understood. Much of what we do know about the user may or may not have attended), and further latent
emotional crisis and mental health comes from interactions signals (subconscious language choices). These signals can
with the healthcare system, but as Figure 1 illustrates, that is empirically inform caregivers, clinicians, and patients about
a limited view of a patient’s life. As our lives are increasingly wellness outside of clinical interactions. Thus, they are likely
digitally mediated (by smartphones, calendars, Internet of best used as a supplement to existing health care data
Things devices, etc.), we are increasing the signals relevant rather than to the exclusion of trained professionals and
to our mental and behavioral health that can be analyzed. clinical measurements. The application of natural language
We focus on a single piece of this signal here, language, but processing (NLP) to data in the whitespace between clinical
this is one of many relevant signals that comprise our digital encounters has progressed sufficiently to warrant design
life; Other examples include geolocation, activity, temporal discussions around integration into existing systems of care.
patterns of device usage. The primary purpose of this paper is To illustrate some of the power of even straightforward
to bring work relevant to the prevention, personalization, and analytics in this space, we also present some novel analysis
scalable measurement of the whitespace to this community. using some of these psychological classifiers on the com-
This research has reached a level of maturity that it may munications of a population and demonstrate correlations
be useful for filling a gap in health care monitoring, and between language usage and real world events affecting that
a critical gap in mental healthcare. Qntfy is a company population.
devoted to the research and development of analysis of A. Linguistic Signals
this whitespace data. The team’s interdisciplinary approach
Linguistics provide an interesting lens through which one
depends on their diversity of skills from psychology, data
can examine and research mental health. Early studies fo-
science, mathematics and computer science.
cused on the writings of those with mental health conditions,
1 Qntfy {glen,casey,ryan}@qntfy.com but with the emergence of computational linguistics and cor-
2 Georgetown University ophir@ir.cs.georgetown.edu pora analysis, this capability was accelerated. The Linguistic

978-1-5090-4179-4/17/$31.00 ©2017 IEEE 393


Inquiry Word Count (LIWC) is a psychometrically validated (e.g., temporal information) in with the linguistic signals for
lexicon, which associates words with given psychological more robust classifiers [24] and a deeper understanding of
concepts [8]. Some of these associations are obvious – the mental health phenomena (e.g., related to onset, variability
use of words like “sister” or “father” evoke a psychological over time, or variability leading up to an emotional crisis)
concept of family, thus any text analyzed with LIWC [25].
containing these words would have the family LIWC The second (2) approach to population level analysis has
concept associated with it. Some of the most interesting primarily been used to study the populations defined by
signals from this sort of linguistic analysis are non-obvious. geography. For example, the psychological classifiers lan-
For example, people who are challenged with depression tend guage of Twitter users has been shown to be more predictive
to use first person pronouns (e.g., “I”) more frequently than of county-level risk of heart disease than the traditionally
matched controls [9], [10], [11]. examined set of demographics (e.g., age, race, ethnicity,
With the uptake of social media, the data available for this gender) for the condition [26]. Interestingly, even including
manner of analysis has also dramatically increased. Recently, these demographics with the language data provided no sta-
a community of interest around this work at the intersection tistically significant benefit to the accuracy of risk prediction.
of computational linguistics and clinical psychology has Similarly, the psychological well-being of counties has also
produced a wealth of interesting analytical and qualitative been assessed, showing improved accuracy of predicting life
papers covering this space [12], [13], [14]. For brevity, we satisfaction over standard demographic and socio-economic
refer the interested reader to the archives of these conferences data [27].
at http://clpsych.org.
II. DATA
B. Population-level analysis
Frequently, the population is defined by geography, likely
Much of the potential power of this sort of analysis is both for its convenience (Twitter makes it easy to collect ge-
its scalability and affordability, which makes the analysis of ographic information) and its intuitive interpretations (states
large populations possible. There are two general approaches have different customs, regulations, and laws that influence
for this population-level analysis: the behavior of interest). We apply similar approaches to
• (1) Maintaining models for each individual in a pop- a different kind of population – a company. Many large
ulation and using aggregates of these models to make enterprises are concerned with the well-being of their em-
inferences about the population. ployees, especially those aspects that effect productivity and
• (2) Aggregate the data for the population, then score burnout. Given the progress of computational linguistics to
this data once in aggregate. quantify mental health signals, analysis of intracompany
Illustrative examples of (1) can be seen for looking at communications may provide some of this insight. We have
estimating suicidal ideation as compared to observed sui- been asked to analyze the internal chat, communications, and
cide rates [15] and post traumatic stress rates [16], both filesharing for a few groups, and show one such analysis in
of geographic populations in the United States. Models to Figure 2.
estimate which users were experiencing suicidal ideation (or
post traumatic stress, respectively) from the language of their III. METHODS
social media posts were created and deployed against a large We use a textual emotion classifier from [17] and a
geographic sample of Twitter data. The former segmented sentiment classifier from [28] to analyze this internal com-
the population to look at demographics well-represented in pany data. Each of these classifiers produces a probability
Twitter (particularly women aged 14-24), while the latter distribution over the possible labels for each message (e.g.,
made no attempt to control for demographic differences ‘positive’, ‘negative’, or ‘neutral’ sentiment). This allows
induced by studying Twitter users. Despite this, both provide us to aggregate all the messages sent on a given day by
evidence indicating known trends and demonstrate face-valid summing the probabilities associated with each outcome.
results. We focus on the emotion- and sentiment-bearing portions
There has been a wide array of progress in the sort of of the data, (i.e., ignoring those labeled with ‘no emotion’
algorithms and models necessary to support the analysis like or ‘neutral sentiment’). By excluding these from the anal-
(1) from linguistic signals. In the past few years, research ysis and focusing on relative proportions of messages with
examining the language usage on social media from users emotions/sentiment we factor out effects and fluctuations due
with a wide array of mental health conditions supports to volume of communication activity alone. This, in effect,
this approach. Research has been published on models for provides a barometer for the mix of emotions and sentiments
suicidal ideation and risk, [17], [15], [18], [19], [20], [21] expressed on a given day by the company.
postpartum depression [22], clinical depression [23], anxiety,
attention deficit hyperactivity (ADHD), bipolar, borderline IV. RESULTS
personality, eating disorders, obsessive compulsive (OCD), Many of these emotion and sentiment plots over time
post traumatic stress (PTSD), schizophrenia, and seasonal produce interesting results, consistent with what would be
affective disorder [11]. A number of groups have also expected. Here, we present in detail a poignant anecdote that
demonstrated the power of incorporating other information illustrates the possible insight from population-level analysis.

394
The composition of emotions and sentiments is calculated for VIII. CONCLUSION
each day of communications for the company (regardless of Language is just one of many possible signals in the
author). A rolling mean over one week is plotted in each whitespace between healthcare interactions that can be used
of the plots in Figure 2. This company exhibits increases for the assessment and treatment of physical and mental
in ‘joy’ around the major holidays and after the first major health conditions. We advocate for the use of these signals
software release on a new project, (top of Figure 2) and in combination with signals already collected by the health
exhibit increases in negative sentiment leading up to each care system, rather than as a replacement. Aligning these
major deadline (bottom). new data streams from the analysis of everyday language
V. DISCUSSION and behaviors, can yield information that will empower
clinicians and their patients to rapidly personalize treatment,
Examining the population-level analysis of a company’s assessment, diagnosis, and intervention all at a scale not
data with straightforward NLP techniques yields some im- previously possible.
mediately interpretable and face-valid results. This company
appears to have more posts with negative sentiment leading ACKNOWLEDGMENT
up to a big deliverable or deployment. This does not explain
The authors would like to express their gratitude to the rest
all of the peaks in the negative sentiment in Figure 2, it is
of Qntfy and the CLPsych community for continual interest
clear that none of the deliverables come without an increase
and passion for advancing the science and understanding of
in negative sentiment in communications (likely related to
mental health and well-being.
stress). The explainable peaks in joy around national holidays
and the first deliverable on a particular project are noticeable R EFERENCES
and intuitively explainable – the inherent reward earned
[1] D. E. Bloom, E. Cafiero, E. Jané-Llopis, S. Abrahams-Gessel, L. R.
by reaching a professional goal that results in a functional Bloom, S. Fathima, A. B. Feigl, T. Gaziano, A. Hamandi, M. Mowafi,
product for the first time. Subsequent releases and updates A. Pandya, K. Prettner, L. Rosenberg, B. Seligman, A. Stein, and
don’t seem to have the same increases in joy, though. C. Weinstein, “The global economic burden of noncommunicable
diseases,” World Economic Forum, Geneva, Tech. Rep., 2011.
This analysis was intended to be illustrative of the sort of [2] SAMHSA, “Substance abuse and mental health services adminis-
population-level analysis now possible. While the classifiers tration,” in Results from the 2013 National Survey on Drug Use
used here were for emotion and sentiment, many of the and Health: Mental Health Findings, NSDUH Series H-49, HHS
Publication No. (SMA) 14-4887, Rockville, MD, 2014.
mental health classifiers mentioned in the introduction are [3] S. C. Curtin, M. Warner, and H. Hedegaard, “Increase in suicide in the
equally conducive to these sorts of analyses. They could united states, 1999-2014.” NCHS data brief, no. 241, pp. 1–8, 2016.
provide information about population-level mental health at [4] M. K. Nock, G. Borges, E. J. Bromet, J. Alonso, M. Angermeyer,
A. Beautrais, R. Bruffaerts, W. T. Chiu, G. De Girolamo, S. Gluzman,
a previously impractical temporal granularity, for example. et al., “Cross-national prevalence and risk factors for suicidal ideation,
plans and attempts,” The British Journal of Psychiatry, vol. 192, no. 2,
VI. E THICS pp. 98–105, 2008.
The ethics of using such aggregated public data to inform [5] E. Sullivan, J. L. Annest, F. Luo, T. Simon, and L. Dahlberg, “Suicide
among adults aged 35–64 years, United States, 1999–2010,” Center
science have begun to be explored. Generally, the ethics for Disease Control and Prevention, Morbidity and Mortality Weekly
and public opinion seem to support using public social Report, 2013.
media data to further mental health research [29], [30]. [6] K. A. Padrez, L. Ungar, H. A. Schwartz, R. J. Smith, S. Hill,
T. Antanavicius, D. M. Brown, P. Crutchley, D. A. Asch, and R. M.
One natural extension of the individual-level analysis is Merchant, “Linking social media and medical record data: a study of
for screening applications, though there are still questions adults presenting to an academic, urban emergency department,” BMJ
remaining about the most ethical way to proceed. At its quality & safety, pp. bmjqs–2015, 2015.
[7] R. J. Smith, P. Crutchley, H. A. Schwartz, L. Ungar, F. Shofer, K. A.
crux is a tension between saving lives of those at risk and Padrez, and R. M. Merchant, “Variations in facebook posting patterns
violating their privacy (and the privacy of those estimated across validated patient health conditions: A prospective cohort study,”
to be at risk) to do so. While the technology would support Journal of Medical Internet Research, vol. 19, no. 1, p. e7, 2017.
[8] J. W. Pennebaker, C. K. Chung, M. Ireland, A. Gonzales, and R. J.
such screening applications, a public discourse over the most Booth, The development and psychometric properties of LIWC2007.
appropriate balance between screening and intervention is Austin, TX: LIWC.net, 2007.
likely warranted. [9] C. Chung and J. Pennebaker, “The psychological functions of function
words,” Social Communication, 2007.
[10] M. De Choudhury, M. Gamon, S. Counts, and E. Horvitz, “Predicting
VII. C AVEATS depression via social media,” in Proceedings of the 7th International
This class of computational linguistic techniques does not AAAI Conference on Weblogs and Social Media (ICWSM), 2013.
[11] G. Coppersmith, M. Dredze, C. Harman, and K. Hollingshead, “From
work for everyone – in particular, many older individuals ADHD to SAD: Analyzing the language of mental health on Twitter
do not generate sufficient volume of textual data to permit through self-reported diagnoses,” in Proceedings of the Workshop on
adequate assessments. Subsequently, this work focuses on Computational Linguistics and Clinical Psychology: From Linguistic
Signal to Clinical Reality. Denver, Colorado, USA: North American
digital natives who allow their communications to be digi- Chapter of the Association for Computational Linguistics, June 2015.
tally mediated. Ultimately, we expect as time progresses that [12] P. Resnik, R. Resnik, and M. Mitchell, Eds., Proceedings of the
the population of digital natives will continue to grow while Workshop on Computational Linguistics and Clinical Psychology:
From Linguistic Signal to Clinical Reality. Baltimore, Maryland,
the number of digital immigrants and those disconnected will USA: Association for Computational Linguistics, June 2014. [Online].
continue to shrink. Available: http://www.aclweb.org/anthology/W14-32

395
Fig. 2. Rolling mean over a one week window of the proportion of messages with emotion that were tagged as “joy” (top) and those with sentiment
tagged as “negative” (bottom) from a group’s internal communications data. Major deadlines are overlayed as dotted lines and major holidays overlayed
as dashed lines.

[13] M. Mitchell, G. Coppersmith, and K. Hollingshead, Eds., Proceedings the ACM Annual Conference on Human Factors in Computing Systems
of the Workshop on Computational Linguistics and Clinical Psychol- (CHI). ACM, 2013, pp. 3267–3276.
ogy: From Linguistic Signal to Clinical Reality. Denver, Colorado, [23] H. A. Schwartz, J. Eichstaedt, M. L. Kern, G. Park, M. Sap, D. Still-
USA: North American Association for Computational Linguistics, well, M. Kosinski, and L. Ungar, “Towards assessing changes in degree
June 2015. of depression through Facebook,” in Proceedings of the ACL Workshop
[14] K. Hollingshead and L. Ungar, Eds., Proceedings of the Workshop on on Computational Linguistics and Clinical Psychology, 2014.
Computational Linguistics and Clinical Psychology: From Linguistic [24] P. Resnik, W. Armstrong, L. Claudino, T. Nguyen, V.-A. Nguyen, and
Signal to Clinical Reality. San Diego, California, USA: North J. Boyd-Graber, “The University of Maryland CLPsych 2015 shared
American Association for Computational Linguistics, June 2016. task system,” in Proceedings of the Workshop on Computational Lin-
[15] G. Coppersmith, R. Leary, E. Whyne, and T. Wood, “Quantifying sui- guistics and Clinical Psychology: From Linguistic Signal to Clinical
cidal ideation via language usage on social media,” in Joint Statistics Reality. Denver, Colorado, USA: North American Chapter of the
Meetings Proceedings, Statistical Computing Section. JSM, 2015. Association for Computational Linguistics, June 2015.
[16] G. Coppersmith, C. Harman, and M. Dredze, “Measuring post trau- [25] K. Hollingshead, H. A. Schwartz, G. Coppersmith, F. Almoradasi,
matic stress disorder in Twitter,” in Proceedings of the 8th Interna- A. Benton, J. Craley, P. Crutchley, D. Hovy, M. Ireland, B. S. Kim,
tional AAAI Conference on Weblogs and Social Media (ICWSM), 2014. L. Kim, R. Merchant, M. Mitchell, P. Resnik, M. Rouhizadeh, and
[17] G. Coppersmith, K. Ngo, R. Leary, and T. Wood, “Exploratory data L. Ungar, “Detecting risk and protective factors of mental health using
analysis of social media prior to a suicide attempt,” in Proceedings of social media,” Center for Language and Speech Processing Technical
the Workshop on Computational Linguistics and Clinical Psychology: Reports, in prep.
From Linguistic Signal to Clinical Reality. San Diego, California, [26] J. C. Eichstaedt, H. A. Schwartz, M. L. Kern, G. Park, D. R.
USA: North American Chapter of the Association for Computational Labarthe, R. M. Merchant, S. Jha, M. Agrawal, L. A. Dziurzynski,
Linguistics, June 2016. M. Sap, C. Weeg, E. E. Larson, L. H. Ungar, and M. E. P. Seligman,
[18] A. Wood, J. Shiffman, R. Leary, and G. Coppersmith, “Discovering “Psychological language on Twitter predicts county-level heart disease
shifts to suicidal ideation from mental health content in social media,” mortality,” Psychological Science, vol. 26, no. 2, pp. 159–169, 2015.
in Proceedings of the SIGCHI Conference on Human Factors in [27] H. A. Schwartz, J. C. Eichstaedt, M. L. Kern, L. Dziurzynski, R. E.
Computing Systems. ACM, 2016. Lucas, M. Agrawal, G. J. Park, S. K. Lakshmikanth, S. Jha, M. E. P.
[19] E. Kiciman, M. Kumar, G. Coppersmith, M. Dredze, and Seligman, and L. H. Ungar, “Characterizing geographic variation in
M. De Choudhury, “Discovering shifts to suicidal ideation from well-being using tweets,” in Proceedings of the 8th International AAAI
mental health content in social media,” in Proceedings of the SIGCHI Conference on Weblogs and Social Media (ICWSM), 2013.
Conference on Human Factors in Computing Systems. ACM, 2016. [28] C. J. Hutto and E. Gilbert, “Vader: A parsimonious rule-based model
for sentiment analysis of social media text,” in Eighth International
[20] A. Cohan, S. Young, and N. Goharian, “Triaging mental health forum
AAAI Conference on Weblogs and Social Media, 2014.
posts,” in Proceedings of the 3rd Workshop on Computational Lin-
[29] M. Conway, “Ethical issues in using twitter for public health surveil-
guistics and Clinical Psychology: From Linguistic Signal to Clinical
lance and research: developing a taxonomy of ethical concepts from
Reality, San Diego, California, USA, June, vol. 16, 2016.
the research literature,” Journal of medical Internet research, vol. 16,
[21] M. Kumar, M. Dredze, G. Coppersmith, and M. De Choudhury, “De-
no. 12, p. e290, 2014.
tecting changes in suicide content manifested in social media following
[30] J. Mikal, S. Hurst, and M. Conway, “Ethical issues in using twitter
celebrity suicides,” in Proceedings of the 26th ACM conference on
for population-level depression monitoring: a qualitative study,” BMC
Hypertext and hypermedia. ACM, 2015.
medical ethics, vol. 17, no. 1, p. 1, 2016.
[22] M. De Choudhury, S. Counts, and E. Horvitz, “Predicting postpartum
changes in emotion and behavior via social media,” in Proceedings of

396

You might also like