Text Mining Methods For The Characterisation of Suicidal Thoughts and Behaviour

Psychiatry Research 322 (2023) 115090
Contents lists available at ScienceDirect
Psychiatry Research
journal homepage: www.elsevier.com/locate/psychres
Text mining methods for the characterisation of suicidal thoughts

and behaviour
Alba Sedano-Capdevila a, Mauricio Toledo-Acosta b, María Luisa Barrigon c, d,
Eliseo Morales-González b, David Torres-Moreno b, Bolívar Martínez-Zaldivar b,
Jorge Hermosillo-Valadez b, Enrique Baca-García a, c, e, f, g, h, i, j, *, MEmind Study Group
a
Department of Psychiatry, University Hospital Rey Juan Carlos, Mostoles, Spain
b
Centro de Investigación en Ciencias, Universidad Autónoma del Estado de Morelos, 62209 Cuernavaca, Morelos, México
c
Department of Psychiatry, University Hospital Jimenez Diaz Foundation, Madrid, Spain
d
Institute of Psychiatry and Mental Health, Hospital General Universitario Gregorio Marañón, Madrid, Spain
e
Department of Psychiatry, General Hospital of Villalba, Madrid, Spain
f
Department of Psychiatry, University Hospital Infanta Elena, Valdemoro, Spain
g
Department of Psychiatry, Madrid Autonomous University, Madrid, Spain
h
CIBERSAM (Centro de Investigación en Salud Mental), Carlos III Institute of Health, Madrid, Spain
i
Universidad Católica del Maule, Talca, Chile
j
Department of psychiatry. Centre Hospitalier Universitaire de Nîmes, France
A R T I C L E I N F O A B S T R A C T
Keywords: Traditional research methods have shown low predictive value for suicidal risk assessments and limitations to be
Suicide, Suicidal ideation applied in clinical practice. The authors sought to evaluate natural language processing as a new tool for
Suicide attempt assessing self-injurious thoughts and behaviors and emotions related.
Natural language processing
We used MEmind project to assess 2838 psychiatric outpatients. Anonymous unstructured responses to the
Machine learning
Mobile health
open-ended question “how are you feeling today?” were collected according to their emotional state. Natural
language processing was used to process the patients’ writings. The texts were automatically represented
(corpus) and analyzed to determine their emotional content and degree of suicidal risk. Authors compared the
patients’ texts with a question used to assess lack of desire to live, as a suicidal risk assessment tool.
Corpus consists of 5,489 short free-text documents containing 12,256 tokenized or unique words. The natural
language processing showed an ROC-AUC score of 0.9638 when compared with the responses to lack of a desire
to live question.
Natural language processing shows encouraging results for classifying subjects according to their desire not to
live as a measure of suicidal risk using patients’ free texts. It is also easily applicable to clinical practice and
facilitates real-time communication with patients, allowing better intervention strategies to be designed.
1. Introduction suicidal self-injury. The interest of these behaviors is underscored by

their higher incidence compared with suicide and in being an earlier
Studying and predicting suicidal behavior are major challenges for step in the suicidal process in which intervention or prevention is still
mental health professionals. More than 700,000 people die by suicide in possible (Fazel & Runeson, 2020; Macrynikola et al., 2018; Turecki
the world each year (World Health Organization, 2021). In Spain, 2020 et al., 2019). Passive SI involves the desire to die or the lack of a desire to
was the year with the highest number of deaths by suicide (3,941). This live without planning self-injury. In the traditional continuum model of
data represents an increase of 7.4% compared to 2019. (Instituto suicide, passive SI is considered an entity of lesser severity compared to
Nacional de Estadística (INE), 2021) SI or suicide attempts (Crosby et al., 1999; Kessler et al., 1999; LeMaster
Self-injurious thoughts and behaviors (STB) encompass a range of et al., 2004; Linden and Barnow, 1997; Yip et al., 2003). However later
constructs such as suicidal ideation (SI), suicide attempts, and non- studies identify passive SI as an important marker of suicidal risk (SR)
* Corresponding author at: Department of Psychiatry, Hospital Universitario Fundación Jiménez Díaz, Madrid Spain.
E-mail address: EBaca@quironsalud.es (E. Baca-García).
https://doi.org/10.1016/j.psychres.2023.115090
Received 26 September 2022; Received in revised form 23 January 2023; Accepted 28 January 2023
Available online 5 February 2023
0165-1781/© 2023 Elsevier B.V. All rights reserved.
A. Sedano-Capdevila et al. Psychiatry Research 322 (2023) 115090
and equate its usefulness with active SI (Liu et al., 2020), or suggesting 2.2. Psychiatric knowledge base
the combination of both as the best predictor method. (Baca-Garcia
et al., 2011) We compared two measures of information collected from patients.
A variety of tools have been created to asess the risk of suicide. Firstly, the texts analyzed (corpus) are anonymous unstructured answers
However, the predictive ability of scales and questionnaires is poor to an open-ended question related to the participant’s current emotional
(Quinlivan et al., 2016). For example, SAD PERSONS Scale (Patterson state: “how are you feeling today?”. Participants were allowed to
et al., 1983) had a sensitivity of 15% and specificity of 97% and the respond to the question up to once per day and were instructed to
Manchester Self-Harm Rule (Cooper et al., 2006)) had a sensitivity of answer as many days as they wished (Barrigón et al., 2017; Berrouiguet
97% and a specificity of 20% (Large et al., 2016, 2018; Quinlivan et al., et al., 2019; Cook et al., 2016; Toledo-Acosta, M et al., 2021). Secondly,
2016; Runeson et al., 2017). Furthermore, other limitations that must be participants were also prompted to answer a question about SR. We
taken into account when using scales include the fact that that the explored passive thoughts of death (lack of desire to live) as a means of
Columbia Suicide Severity Rating Scale (Posner et al., 2007) does not assessing of SR. Thus, each participant answered the question “Have you
encompass the full spectrum of SI (Giddens et al., 2014) and that a ever felt that you had no desire to live? using a 6-point Likert scale of
minimum level of patient literacy is necessary for Suicide Ideation Scale fering the following options: 0: “all the time”, 1: “most of the time”, 2:
(Beck et al., 1979) administration (Batterham et al., 2015). In this way, “more than half of the time”, 3: “less than half of the time”, 4: “occa
clinical interview is still regarded as the gold standard for assessing SR sionally”, and 5: “never”. We divided the sample into six groups based
(Links and Hoffman, 2005; American Psychiatric Association, 2006) and on this question.
there is a general agreement that scales should not be used in place of the Some words were evaluated separately without considering the rest
interview or clinical judgment but rather as a support or complement of text in which they were included. Each word were scored by two
instead (Baca-Garcia et al., 2011). Specifically, World Health Organi clinical psychiatrists with 8 and 16 years of experience respectively (ASC
zation recommends in their suicide prevention program document rec and MLB) according to the relationship to risk or protective factors for
ommends a gradual approach to suicidal patients, asking for lack of suicide (Donald et al., 2006). If the words were related to risk factors for
desire to live before to directly ask about SI (Saxena et al., 2014) suicide, they were assigned a negative score. If they were related to
These results have led researchers to consider new methodologies for protective factors for suicide, they were assigned a positive score. We
SR assessment. Natural language processing (NLP) can be used to also considered whether the words referred to positive or negative
analyze large volumes of unprocessed texts and use them as predictive emotions. When a discrepancy in the score was found, a senior psychi
elements (Krahmer, 2010; Velupillai et al., 2019; Walker, 1981). NLP atrist, expert in suicidology, (EBG) resolved it. The absolute value
has shown promising results as a tool for studying suicidal behavior assigned was decided on the basis of the strength of the association with
comparing favorably against other traditional methods. (Cook et al., the protective and risk factors.
2016; Levis et al., 2020; Metzger et al., 2017; J. P. Pestian et al., 2016; In this way, the emotional aspect is linked to this predefined list of
Velupillai et al., 2019). prototypical words, where each word is accompanied with what we call
Our aim in this study was to evaluate the usefulness of NLP as a tool a clinical score ranging from -5 to +5, based on psychiatric knowledge. A
for assessing SR and emotions related to STB from patients’ free text. In list of the most common positive and negative words is showed in sup
line with the first studies on the use of the NLP to predict SR, we hy plementary material. Intuitively, the emotional weight of each word
pothesized that it would prove to be a suitable tool in clinical practice for must depend on the lexical collocations in each message (the association
analyzing and predicting lack of a desire to live as a measure of SR. of the word with “positive” or “negative words”). Hence, the intuition
behind our scoring system is to use this prototypical list of words as a
2. Methods knowledge base about the possible emotional states of the subjects, and
to exploit the distributional semantics properties of the embedding
2.1. Sampling method and corpus description vector space in order to understand the behavior of distinct linguistic
patterns.
The study included 2838 adult outpatients who had attended any of
the psychiatric services within the Psychiatry Department of Hospital 2.3. Text mining methods
Fundación Jiménez Díaz in Madrid, Spain from May 2014 to May 2015.
The Department comprises six community mental health centers and is Our approach consisted of modeling and analyzing the linguistic
part of the Spanish National Health Service, providing tax-funded medical patterns present in free writing texts, under the hypothesis that the
care to a catchment area of approximately 850,000 people. All patients lexical associations found in them can reveal clearly typified emotional
receiving follow-up care at the centers were eligible for the study. In states. The representation of these texts at the word and message content
clusion criteria were either male or female outpatients, aged 18 or older, levels allows for both the observation of different patterns of language
who provided written informed consent. Exclusion criteria were insuffi use in contexts of daily life and for the construction of predictive models
cient literacy, refusal to participate, current imprisonment, being under based on this use and its association with different emotional states.
guardianship, and emergency situations in which the patient’s state of Within NLP, the vast amount of research into the issue of automatically
health did not allow for a written informed consent. The study was carried determining the emotional content of a written message falls under the
out in compliance with the Declaration of Helsinki and approved by the umbrella term of sentiment analysis (Ge-Stadnyk et al., 2017; Serra
Fundación Jiménez Díaz research ethics committee (PIC 76-2013 FJD- no-Guerrero et al., 2015). Our methodology consisted of representing
HIE-HRJC), approved on 01/28/2014 assigned the number 01/14. and analyzing linguistic patterns in free-writing texts through the
Specifically, the patients included in this study were those that used development of text mining tools based on the distributional hypothesis,
the free text field of the MEmind Wellness Tracker (Barrigón et al., which states that words sharing a similar context have a similar mean
2017). MEmind is a web application available for access from computer, ing. Thus, we developed scoring methods to account for the cognitive
tablet and smartphone. Following specialized training, all of the Psy emotional weight carried by each word in the corpus (Toledo-Acosta
chiatry Department’s clinicians began using this tool in clinical practice et al., 2020).
as of May 2014. MEmind has two interfaces: one for clinicians, where
diagnostic information and other clinical data was recorded, and 3. Results
another one for patients, where they are asked to complete different
questionnaires and also have an space for free text titled “how are you 2838 Patients reported data and had a mean age of 47.2 years and
feeling today?” were mostly female (62.0%). The most represented diagnoses in the
2
participants were anxiety related disorders (49.0%) and mood disorders accounting for the emotional polarity of words. On the horizontal axis,
(23.5%) (Barrigón et al., 2017). were represented response labels. The two most negative labels contain
Therefore, the corpus consists of 5489 short free-form texts and only negative or null scores, analogous to the messages with the two
12,256 tokenized words. The minimum length of the responses was one most positive labels. Messages with intermediate scores show a mixed
(one-word text), the maximum length 77, and the average number of behavior, combining negative and positive values. We observed that
words per text was 21. labels with lower scores (all time / most of the time had lack desire to
The first results we observed were those obtained by means of the live) concentrate negative or neutral words, whereas the opposite hap
embedded word representation technique. A word embedding is a point pens for labels with high scores (never / occasionally had lack of a desire
in space representing the word. The coordinates of the embedding are to live), where only positive or neutral words appear.
generated by a neural network algorithm. The word embedding algo This nearly dichotomous behavior enabled us to obtain very
rithm used in this paper is word2vec (Mikolov et al., 2013). We later encouraging classification results with a ROC-AUC score of 0.964 rep
modified the word embeddings using our representation method. Fig. 1, resented in Fig. 4. The Area Under the Curve (AUC) is an overall sum
with most frequent words per tag, shows word clouds for each tag mary of diagnostic accuracy. AUC equals 0.5 when the ROC curve
associated with a level of emotion reported in the messages. The size of corresponds to random chance and 1.0 for perfect accuracy (Zou et al.,
the words was directly proportional to their frequency. The words with 2007). The classification was done using a Convolutional Neural
the highest frequency throughout all the message labels were “day”, Network, and the AUC score for this multi-class classification was
“feel”, “today”, “bad”, and “good”. computed by averaging the AUC of each class against the rest. These
Fig. 2 shows word scores per tag represented with clouds of scored results reveal the quality of our scoring and representation systems. It is
words for each tag in the messages. In this case, the size of the words is worth noting that the representation algorithms were applied to the
directly proportional to the absolute value of their score. The word whole corpus, i.e., to 100% of the messages. In order to validate these
clouds associated with lower scores of lack of a desire to live (negative representations, the corpus was divided into two parts for the message
labels) are colored in red, while the word clouds in messages with a classification task: 80% total number of messages was intended for
higher score (positive labels) appear in green. The words with the lowest classifier training and the other 20% for testing. It is with this 20% that
score (most negative) are drug names and the words “die”, “had”, the performance of the classifier was evaluated by calculating the
“suicide”, “pill”, and “dose”. On the other hand, the words with the ROC-AUC.
highest score (most positive) are those related to family and the words We show in Fig. 5 a Graph of clusters and similarities between
“live”, “life”, and “friend”. clusters and their association of word by topic. Our clustering method
Fig. 3 shows the distribution of scores in the messages in each of the groups words together if their vectors are close to each other, this is done
labels. In violin plot of scores per label was represented in the vertical using a density criteria (Ester et al., 1996; Toledo-Acosta et al., 2020).
axis the scores of words in each label. This score is a number between -1 Each topic was determined based on the content of the clusters(Tole
(negative score, colored in red) and +1 (positive score, colored in green) do-Acosta et al., 2020).
Fig 1. Most frequent words per tag.
3
Fig 2. Words scores per tag.
Fig 4. ROC Curves Macro and Micro Average.

Fig 3. Violinplot of prescores per label.
occurrence of these two words in the corpus is higher. It is worth noting
that this graph shows that some clusters are very close to each other.
The graph shows the clusters of words after applying the represen
tation algorithm. Size of the node is proportional to the diameter of the 4. Discussion
cluster. The size of the edges is proportional to the similarity between
the centroids of the clusters and the labels on the edges correspond to a In this work, we divided the free texts of psychiatric outpatients
similarity metric between the centroids of the clusters. The similarity according to the response to the question about lack of desire to live (as a
metric indicating how close are the centroids of the clusters to each measure of suicidality). The words represented in each label, their fre
other. Is a number between -1 (furthest) and 1 (closest). The edges were quency and emotional charge were found to correlate with what is ex
drawn only if this similarity metric was higher than 85,0%. The closer pected in clinical practice. These similarities bring NLP closer as an
two-word vector representations are to each other, the frequency of co- easily understandable tool for mental health professionals. In addition,
4
Other studies have shown that higher AUC values were found for studies
assessing SI than studies assessing death by suicide (Bernert et al., 2020;
Levis et al., 2022). These new results shown by NLP are very encour
aging both from a research point of view and for use in the clinical
setting.
Previous works on the use of NLP to study suicide have mainly
analyzed clinicians’ notes from electronic medical records (Bar
ak-Corren et al., 2017; Levis et al., 2020, 2022; McCoy et al., 2016; Tran
et al., 2014). New predictive models have been analyzed as tools for
assessing SR and have attempted to compare these new methodologies
with traditional methodologies. In a study with an 18-item scale, ma
chine learning showed better area under the ROC curve (AUC) scores of
0.73-0.79 versus 0.55-0.59 compared to manual analysis by clinicians
(Tran et al., 2014). NLP was also shown to be more effective when
combined with traditional methods, with small improvements in pre
diction (8%) compared to previous methods (Levis et al., 2020).
Only a few studies like ours have used patients’ text. Pestian et al.
2010 compared elicited and genuine suicide notes. Machine learning
showed 74% accuracy in identifying true suicide notes. Manual models
Fig 5. Graph of clusters and similarities between clusters. performed by mental health professionals obtained lower levels (63%
and 49%) of accuracy (J. Pestian et al., 2010). In the case of transcripts
the method is capable of classifying subjects’ words according to STB in of completed interventions, other studies rate positively the ability to
real time with a ROC-AUC score of 0.964. differentiate between suicidal texts and other texts (with an accuracy
As seen in previous work, NLP is a tool easily applicable to routine ranging from 80 to 89%) or as a method with potential to identify
clinical practice (Cook et al., 2016; Levis et al., 2022). After NLP analysis subjects at risk of suicide (Oseguera et al., 2017; J. P. Pestian et al.,
using the embedded word representation technique, we observed that 2016).
the most frequently used words correlate with what clinicians observe Our study presents some strengths. It is a novel work by analyzing
when assessing suicidal risk. As expected, emotions with a negative free text generated by the patient in real time to detect symptoms as they
connotation appear in the labels with low scores (representing scores of occur. It also allows us to explore the patient in a naturalistic setting by
lack of a desire to life) and those that are semantically positive appear in taking as many measurements as the patient wants or needs. This flex
the labels with higher scores. ibility of communication facilitates patient access to psychiatric care at
We also found recent and short temporal terms such as "day" or any time and in cases of limited access as in some parts of the world
"today" represented among patient`s concerns. This alludes to the fact (WHO, 2017). The ability to filter information and sort it by high SR
that the descriptions of emotions, feelings, and triggers of suicide are groups and situations allows intervention efforts to be targeted to the
referred to recent times and do not go back to situations delayed in time. most severe cases. So far, we have not been able to develop interventions
The literature has demonstrated that emotions and SI fluctuate over before suicidal behaviors occur (Allen et al., 2013; Choi et al., 2012).
short periods of time according to life events or situations (Czyz et al., Methodologies to identify crisis episodes and high-risk situations will
2018a; Hallensleben et al., 2019; Kleiman et al., 2017; Mou et al., 2018). improve intervention systems.
In analysis considering the emotional scores, the words represented On the other hand, our study has limitations. The sample is made up
in each label were observed according to lack of desire to live. The most of patients seen in mental health services. We found some biases in these
negative words (related to the term death) are represented in the labels samples of e-health studies as a underrepresentation of patients with
with low scores. Positive words (related to life and family) are repre psychotic disorders (Lopez-Morinigo et al., 2021). In terms of method
sented in the positive labels, with higher scores. Previous work by ology, some limitations were also found. First, clinicians had difficulties
McCoy et al. assigned a positive or negative valence to texts analyzed by scoring certain words. Scoring was isolated and words, depending on the
NLP. If the method identified a positive valence in the text, this was context, could have several meanings. These questionable scores were
associated with a 30% reduction in the risk of suicide (McCoy et al., assigned by consensus. Second, we have assessed suicidality with an
2016). own designed question that explores the desire of not to live. This
Regarding topics, the following clusters were identified: sleep, question and its Likert scale response is not a validated method of SR
perspective, relationships, aggressiveness, health, names, places, and assessment. Finally, this is a cross-sectional study and conclusions about
drugs. Patients with chronic diseases frequently refer to suicide in predictive value should be restrained.
consultation. The same happens in the case of insomnia, and researchers In conclusion our results reached using NLP mirror what is observed
have shown interest in both aspects (Kirtley et al., 2020; Littlewood in routine clinical practice. Moreover, from the emotional analysis, we
et al., 2019; Lopez-Castroman and Jaussent, 2020). Relationships have were able to classify patients according to their lack of a desire to live
been previously mentioned as influencing factors in SI as both triggers with a ROC-AUC score of 0.964. These are promising results compared
and protective factors. When describing SI, patients often talk about the to traditional methods of SR assessment.
plan they had in mind. They speak in particular of the drugs they would This methodology of analyzing patients’ free text facilitates the
take or of the self-injurious or aggressive acts they would carry out. collection of data and variations regarding suicidal behavior in real
History of aggression is less commonly explored in consultation, but the time. It also facilitates patient access to clinicians regardless of prob
literature has related it to suicidal behavior. (Fontanella and Campo, lematic issues such as distance, consultation frequency, or mental health
2020; Holliday et al., 2021; Segal et al., 2021). investment policies.
The ability to classify patients into groups based on words was per Future lines of research could be directed towards methods that
formed with a ROC-AUC score of 0.964 when compared to the responses compare NLP with tools assessing SR. In addition, to improve the pre
to lack of a desire to live question. Traditionally used instruments have a diction capacity of the method, the patients themselves could be the
low predictive value for identifying groups of patients according to their ones to score the words according to their emotional charge in relation
SR. Almost all instruments fall short: a recent meta-analysis shows a to suicide. The perception of greater emotional nuances and the iden
mean sensitivity of 56% and specificity of 79% (Large et al., 2016). tification of the most at-risk patients from linguistic analysis would lead
5
to the design of more effective real-time crisis intervention projects. Rodríguez, A., Baca-García, E., MEmind study group, 2017. User profiles of an
electronic mental health tool for ecological momentary assessment: MEmind. Int. J.
Methods Psychiatr. Res. 26 (1) https://doi.org/10.1002/mpr.1554.
Ethics approval Batterham, P.J., Ftanou, M., Pirkis, J., Brewer, J.L., Mackinnon, A.J., Beautrais, A.,
Fairweather-Schmidt, A.K., Christensen, H., 2015. A systematic review and
evaluation of measures for suicidal ideation and behaviors in population-based
Approved.
research. Psychol. Assess. 27 (2), 501–512. https://doi.org/10.1037/pas0000053.
Beck, A.T., Kovacs, M., Weissman, A., 1979. Assessment of suicidal intention: the scale
Consent to participate for suicide ideation. J. Consult. Clin. Psychol. 47 (2), 343–352. https://doi.org/
10.1037//0022-006x.47.2.343.
Bernert, R.A., Hilberg, A.M., Melia, R., Kim, J.P., Shah, N.H., Abnousi, F., 2020. Artificial
All signed. intelligence and suicide prevention: a systematic review of machine learning
investigations. Int. J. Environ. Res. Public Health 17 (16), E5929. https://doi.org/
Consent for publication 10.3390/ijerph17165929.
Berrouiguet, S., Barrigón, M.L., Castroman, J.L., Courtet, P., Artés-Rodríguez, A., Baca-
García, E., 2019. Combining mobile-health (mHealth) and artificial intelligence (AI)
Not applicable. methods to avoid suicide attempts: the Smartcrises study protocol. BMC Psychiatry
19 (1), 277. https://doi.org/10.1186/s12888-019-2260-y.
Choi, J.W., Park, S., Yi, K.K., Hong, J.P., 2012. Suicide mortality of suicide attempt
Availability of data and material patients discharged from emergency room, nonsuicidal psychiatric patients
discharged from emergency room, admitted suicide attempt patients, and admitted
Under Request nonsuicidal psychiatric patients. Suicide Life Threat. Behav. 42 (3), 235–243.
https://doi.org/10.1111/j.1943-278X.2012.00085.x.
Cook, B.L., Progovac, A.M., Chen, P., Mullin, B., Hou, S., Baca-Garcia, E., 2016. Novel use
Code availability of natural language processing (NLP) to predict suicidal ideation and psychiatric
symptoms in a text-based mental health intervention in Madrid. Comput. Math.
Methods Med. 2016, 8708434 https://doi.org/10.1155/2016/8708434.
Not applicable. Cooper, J., Kapur, N., Dunning, J., Guthrie, E., Appleby, L., Mackway-Jones, K., 2006.
A clinical tool for assessing risk after self-harm. Ann. Emerg. Med. 48 (4), 459–466.
Author statement https://doi.org/10.1016/j.annemergmed.2006.07.944.
Crosby, A.E., Cheltenham, M.P., Sacks, J.J., 1999. Incidence of suicidal ideation and
behavior in the United States, 1994. Suicide Life Threat. Behav. 29 (2), 131–140.
This manuscript has not been published and is not under consider Czyz, E.K., King, C.A., Nahum-Shani, I., 2018a. Ecological assessment of daily suicidal
ation to publication elsewhere. We have no conflict of interest to thoughts and attempts among suicidal teens after psychiatric hospitalization: Lessons
about feasibility and acceptability. Psychiatry Res. 267, 566–574. https://doi.org/
disclose. All authors have approved the manuscript and agree with its 10.1016/j.psychres.2018.06.031.
submission to Psychiatry Research. Donald, M., Dower, J., Correa-Velez, I., Jones, M., 2006. Risk and protective factors for
medically serious suicide attempts: a comparison of hospital-based with population-
based samples of young adults. Aust. N. Z. J. Psychiatry 40 (1), 87–96. https://doi.
Declaration of Competing Interest org/10.1080/j.1440-1614.2006.01747.x.
Ester, M., Kriegel, H., Sander, J., Xu, X., 1996. A Density-Based Algorithm for
Enrique Baca-Garcia has designed MEmind. Discovering Clusters in Large Spatial Databases with Noise. KDD.
Fazel, S., Runeson, B., 2020. Suicide. N. Engl. J. Med. 382 (3), 266–274. https://doi.org/
10.1056/NEJMra1902944.
Financial support Fontanella, C.A., Campo, J.V., 2020. Child abuse and neglect contributing to youth
suicide-reply. JAMA Pediatrics 174 (12), 1214–1215. https://doi.org/10.1001/
jamapediatrics.2020.2576.
Research was partially funded by CONACYT Project A1-S-24213 of Ge-Stadnyk, J., Alonso-Vazquez, M., & Gretzel, U. (2017). Sentiment analysis: a review.
Basic Science and CONACYT grants 28268 and 30053, by the Instituto Giddens, J.M., Sheehan, K.H., Sheehan, D.V., 2014. The Columbia-Suicide Severity
de Salud Carlos III jointly with the European Commission (ERDF) (ISCIII Rating Scale (C-SSRS): has the «Gold Standard» Become a Liability? Innov. Clin.
Neurosci. 11 (9-10), 66–80.
PI16/01852), by American Foundation for Suicide Prevention (LSRG-1- Hallensleben, N., Glaesmer, H., Forkmann, T., Rath, D., Strauss, M., Kersting, A.,
005-16) and by the Madrid Regional Government (B2017/BMD-3740 Spangenberg, L., 2019. Predicting suicidal ideation by interpersonal variables,
AGES-CM 2CM; Y2018/TCS-4705 PRACTICO-CM). hopelessness and depression in real-time. An ecological momentary assessment study
in psychiatric inpatients with depression. Eur. Psychiatry 56 (1), 43–50. https://doi.
org/10.1016/j.eurpsy.2018.11.003.
Acknowledgments Holliday, R., Forster, J.E., Schneider, A.L., Miller, C., Monteith, L.L., 2021. Interpersonal
violence throughout the lifespan: associations with suicidal ideation and suicide
attempt among a national sample of female veterans. Med. Care 59, S77–S83.
The authors acknowledge Oliver Shaw, who helped in editing this https://doi.org/10.1097/MLR.0000000000001447.
article. Instituto Nacional de Estadística (INE). (2021, noviembre 10). Defunciones según la
Causa de Muerte Año 2020. https://ine.es/prensa/edcm_2020.pdf.
Kessler, R.C., Borges, G., Walters, E.E., 1999. Prevalence of and Risk Factors for Lifetime
Supplementary materials Suicide Attempts in the National Comorbidity Survey. Arch. Gen. Psychiatry 56 (7),
617. https://doi.org/10.1001/archpsyc.56.7.617.
Supplementary material associated with this article can be found, in Kirtley, O.J., Rodham, K., Crane, C., 2020. Understanding suicidal ideation and
behaviour in individuals with chronic pain: A review of the role of novel
the online version, at doi:10.1016/j.psychres.2023.115090. transdiagnostic psychological factors. Lancet. Psychiatry 7 (3), 282–290. https://doi.
org/10.1016/S2215-0366(19)30288-3.
References Kleiman, E.M., Turner, B.J., Fedor, S., Beale, E.E., Huffman, J.C., Nock, M.K., 2017.
Examination of real-time fluctuations in suicidal ideation and its risk factors: Results
from two ecological momentary assessment studies. J. Abnorm. Psychol. 126 (6),
Allen, M.H., Abar, B.W., McCormick, M., Barnes, D.H., Haukoos, J., Garmel, G.M.,
726–738. https://doi.org/10.1037/abn0000273.
Boudreaux, E.D., 2013. Screening for suicidal ideation and attempts among
Krahmer, E., 2010. What Computational Linguists Can Learn from Psychologists (and
emergency department medical patients: instrument and results from the Psychiatric
Vice Versa). Computational Linguistics 36, 285–294. https://doi.org/10.1162/
Emergency Research Collaboration. Suicide Life Threat. Behav. 43 (3), 313–323.
coli.2010.36.2.36201.
https://doi.org/10.1111/sltb.12018.
Large, M., Kaneson, M., Myles, N., Myles, H., Gunaratne, P., Ryan, C., 2016. Meta-
Baca-Garcia, E., Perez-Rodriguez, M.M., Oquendo, M.A., Keyes, K.M., Hasin, D.S.,
analysis of longitudinal cohort studies of suicide risk assessment among psychiatric
Grant, B.F., Blanco, C., 2011. Estimating risk for suicide attempt: are we asking the
patients: heterogeneity in results and lack of improvement over time. PLoS One 11
right questions? Passive suicidal ideation as a marker for suicidal behavior. J. Affect.
(6), e0156322. https://doi.org/10.1371/journal.pone.0156322.
Disord. 134 (1-3), 327–332. https://doi.org/10.1016/j.jad.2011.06.026.
Large, M., Myles, N., Myles, H., Corderoy, A., Weiser, M., Davidson, M., Ryan, C.J., 2018.
Barak-Corren, Y., Castro, V.M., Javitt, S., Hoffnagle, A.G., Dai, Y., Perlis, R.H., Nock, M.
Suicide risk assessment among psychiatric inpatients: a systematic review and meta-
K., Smoller, J.W., Reis, B.Y., 2017. Predicting suicidal behavior from longitudinal
analysis of high-risk categories. Psychol. Med. 48 (7), 1119–1127. https://doi.org/
electronic health records. Am. J. Psychiatry 174 (2), 154–162. https://doi.org/
10.1017/S0033291717002537.
10.1176/appi.ajp.2016.16010077.
Barrigón, M.L., Berrouiguet, S., Carballo, J.J., Bonal-Giménez, C., Fernández-Navarro, P.,
Pfang, B., Delgado-Gómez, D., Courtet, P., Aroca, F., Lopez-Castroman, J., Artés-
6
LeMaster, P.L., Beals, J., Novins, D.K., Manson, S.M., 2004. The prevalence of suicidal Pestian, J.P., Grupp-Phelan, J., Bretonnel Cohen, K., Meyers, G., Richey, L.A.,
behaviors among northern plains American Indians. Suicide Life Threat. Behav. 34 Matykiewicz, P., Sorter, M.T., 2016. A controlled trial using natural language
(3), 242–254. https://doi.org/10.1521/suli.34.3.242.42780. processing to examine the language of suicidal adolescents in the emergency
Levis, M., Leonard Westgate, C., Gui, J., Watts, B.V., Shiner, B., 2020. Natural language department. Suicide Life Threat. Behav. 46 (2), 154–159. https://doi.org/10.1111/
processing of clinical mental health notes may add predictive value to existing sltb.12180.
suicide risk models. Psychol. Med. 1–10. https://doi.org/10.1017/ Posner, K., Oquendo, M.A., Gould, M., Stanley, B., Davies, M., 2007. Columbia
S0033291720000173. classification algorithm of suicide assessment (C-CASA): classification of suicidal
Levis, M., Levy, J., Dufort, V., Gobbel, G.T., Watts, B.V., Shiner, B., 2022. Leveraging events in the FDA’s pediatric suicidal risk analysis of antidepressants. Am. J.
unstructured electronic medical record notes to derive population-specific suicide Psychiatry 164 (7), 1035–1043. https://doi.org/10.1176/ajp.2007.164.7.1035.
risk models. Psychiatry Res. 315, 114703 https://doi.org/10.1016/j. En American Psychiatric Association, 2006. Practice guideline for the assessment and
psychres.2022.114703. treatment of patients with suicidal behaviors. APA Practice Guidelines for the
Linden, M., Barnow, S., 1997. 1997 IPA/Bayer Research Awards in Psychogeriatrics. The Treatment of Psychiatric Disorders: Comprehensive Guidelines and Guideline Watches (1.a
wish to die in very old persons near the end of life: a psychiatric problem? Results ed., Vol. 1). American Psychiatric Association. https://doi.org/10.1176/appi.
from the Berlin Aging Study. Int. Psychogeriatr. 9 (3), 291–307. https://doi.org/ books.9780890423363.56008.
10.1017/s1041610297004456. Quinlivan, L., Cooper, J., Davies, L., Hawton, K., Gunnell, D., Kapur, N., 2016. Which are
Links, P.S., Hoffman, B., 2005. Preventing suicidal behaviour in a general hospital the most useful scales for predicting repeat self-harm? A systematic review
psychiatric service: Priorities for programming. Can. J. Psychiatry 50 (8), 490–496. evaluating risk scales using measures of diagnostic accuracy. BMJ Open 6 (2),
https://doi.org/10.1177/070674370505000809. e009297. https://doi.org/10.1136/bmjopen-2015-009297.
Littlewood, D.L., Kyle, S.D., Carter, L.-A., Peters, S., Pratt, D., Gooding, P., 2019. Short Runeson, B., Odeberg, J., Pettersson, A., Edbom, T., Jildevik Adamsson, I., Waern, M.,
sleep duration and poor sleep quality predict next-day suicidal ideation: an 2017. Instruments for the assessment of suicide risk: A systematic review evaluating
ecological momentary assessment study. Psychol. Med. 49 (3), 403–411. https://doi. the certainty of the evidence. PLoS One 12 (7), e0180292. https://doi.org/10.1371/
org/10.1017/S0033291718001009. journal.pone.0180292.
Liu, R.T., Bettis, A.H., Burke, T.A., 2020. Characterizing the phenomenology of passive Saxena, S., Krug, E.G., Chestnov, O., World Health Organization, 2014. Preventing
suicidal ideation: A systematic review and meta-analysis of its prevalence, Suicide: A Global Imperative. World Health Organization.
psychiatric comorbidity, correlates, and comparisons with active suicidal ideation. Segal, L., Armfield, J.M., Gnanamanickam, E.S., Preen, D.B., Brown, D.S., Doidge, J.,
Psychol. Med. 50 (3), 367–383. https://doi.org/10.1017/S003329171900391X. Nguyen, H., 2021. Child Maltreatment and Mortality in Young Adults. Pediatrics 147
Lopez-Castroman, J., Jaussent, I., 2020. Sleep disturbances and suicidal behavior. Cur. (1), e2020023416. https://doi.org/10.1542/peds.2020-023416.
Topics Behav. Neurosci. 46, 211–228. https://doi.org/10.1007/7854_2020_166. Serrano-Guerrero, J., Olivas, J.A., Romero, F.P., Herrera-Viedma, E., 2015. Sentiment
Lopez-Morinigo, J.-D., Barrigón, M.L., Porras-Segovia, A., Ruiz-Ruano, V.G., Escribano analysis: a review and comparative analysis of web services. Inf. Sci. 311, 18–38.
Martínez, A.S., Escobedo-Aedo, P.J., Sánchez Alonso, S., Mata Iturralde, L., Muñoz https://doi.org/10.1016/j.ins.2015.03.040.
Lorenzo, L., Artés-Rodríguez, A., David, A.S., Baca-García, E., 2021. Use of ecological Toledo-Acosta, M., Barreiro, T., Reig-Alamillo, A., Müller, M., Aroca Bisquert, F.,
momentary assessment through a passive smartphone-based app (eB2) by patients Barrigon, M.L., Baca-Garcia, E., Hermosillo-Valadez, J, 2020. Cognitive emotional
with schizophrenia: acceptability study. J. Med. Internet Res. 23 (7), e26548. embedded representations of text to predict suicidal ideation and psychiatric
https://doi.org/10.2196/26548. symptoms. Mathematics 8 (11), 11. https://doi.org/10.3390/math8112088. Art.
Macrynikola, N., Miranda, R., Soffer, A., 2018. Social connectedness, stressful life events, Toledo-Acosta, M, Martínez-Zaldivar, B, Ehrlich-López, A, Morales-González, E, Torres-
and self-injurious thoughts and behaviors among young adults. Compr. Psychiatry Moreno, D, Hermosillo-Valadez, J., 2021. Semantic representations of words and
80, 140–149. https://doi.org/10.1016/j.comppsych.2017.09.008. automatic keywords extraction for sentiment analysis of tourism reviews. In:
McCoy, T.H., Castro, V.M., Roberson, A.M., Snapper, L.A., Perlis, R.H., 2016. Improving Proceedings of the Third Workshop for Iberian Languages Evaluation Forum
prediction of suicide and accidental death after discharge from general hospitals (IberLEF 2021), pp. 1–16. CEURWS Proceedings.
with natural language processing. JAMA Psychiatry 73 (10), 1064–1071. https:// Tran, T., Luo, W., Phung, D., Harvey, R., Berk, M., Kennedy, R.L., Venkatesh, S., 2014.
doi.org/10.1001/jamapsychiatry.2016.2172. Risk stratification using data from electronic medical records better predicts suicide
Metzger, M.-H., Tvardik, N., Gicquel, Q., Bouvry, C., Poulet, E., Potinet-Pagliaroli, V., risks than clinician assessments. BMC Psychiatry 14, 76. https://doi.org/10.1186/
2017. Use of emergency department electronic medical records for automated 1471-244X-14-76.
epidemiological surveillance of suicide attempts: a French pilot study. Int. J. Turecki, G., Brent, D.A., Gunnell, D., O’Connor, R.C., Oquendo, M.A., Pirkis, J.,
Methods Psychiatr. Res. 26 (2) https://doi.org/10.1002/mpr.1522. Stanley, B.H., 2019. Suicide and suicide risk. Nature Rev. Disease Primers 5 (1), 74.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J., 2013. Distributed https://doi.org/10.1038/s41572-019-0121-0.
representations of words and phrases and their compositionality. Advances in Neural Velupillai, S., Hadlaczky, G., Baca-Garcia, E., Gorrell, G.M., Werbeloff, N., Nguyen, D.,
Information Processing Systems, p. 26. https://papers.nips.cc/paper/2013/hash Patel, R., Leightley, D., Downs, J., Hotopf, M., Dutta, R., 2019. Risk assessment tools
/9aa42b31882ec039965f3c4923ce901b-Abstract.html. and data-driven approaches for predicting and preventing suicidal behavior. Front.
Mou, D., Kleiman, E.M., Fedor, S., Beck, S., Huffman, J.C., Nock, M.K., 2018. Negative Psychiatry 10, 36. https://doi.org/10.3389/fpsyt.2019.00036.
affect is more strongly associated with suicidal thinking among suicidal patients with Walker, D.E., 1981. The organization and use of information: contributions of
borderline personality disorder than those without. J. Psychiatr. Res. 104, 198–201. information science, computational linguistics and artificial intelligence. J. Am. Soc.
https://doi.org/10.1016/j.jpsychires.2018.08.006. Inf. Sci. 32 (5), 347–363. https://doi.org/10.1002/asi.4630320516.
Oseguera, O., Rinaldi, A., Tuazon, J., & Cruz, A. (2017). Automatic quantification of the WHO. (2017). Mental Health ATLAS 2017. https://www.who.int/publications-detail-red
veracity of suicidal ideation in counseling transcripts (p. 479). https://doi.org/10.1007/ irect/9789241514019.
978-3-319-58750-9_66. World Health Organization, 2021. Suicide worldwide in 2019: Global health estimates.
Patterson, W.M., Dohn, H.H., Bird, J., Patterson, G.A., 1983. Evaluation of suicidal World Health Organization. https://apps.who.int/iris/handle/10665/341728.
patients: the SAD PERSONS scale. Psychosomatics 24 (4), 343–345. https://doi.org/ Yip, P.S.F., Chi, I., Chiu, H., Chi Wai, K., Conwell, Y., Caine, E, 2003. A prevalence study
10.1016/S0033-3182(83)73213-5, 348-349. of suicide ideation among older adults in Hong Kong SAR. Int. J. Geriatr. Psychiatry
Pestian, J., Nasrallah, H., Matykiewicz, P., Bennett, A., Leenaars, A., 2010. Suicide note 18 (11), 1056–1062. https://doi.org/10.1002/gps.1014.
classification using natural language processing: a content analysis. Biomed. Inf. Zou, K.H., O’Malley, A.J., Mauri, L, 2007. Receiver-operating characteristic analysis for
Insights 3. https://doi.org/10.4137/BII.S4706. evaluating diagnostic tests and predictive models. Circulation 115 (5), 654–657.
https://doi.org/10.1161/CIRCULATIONAHA.105.594929.

Text Mining Methods For The Characterisation of Suicidal Thoughts and Behaviour

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Text Mining Methods For The Characterisation of Suicidal Thoughts and Behaviour

Uploaded by

Copyright:

Available Formats

Psychiatry Research 322 (2023) 115090

Contents lists available at ScienceDirect

Text mining methods for the characterisation of suicidal thoughts

1. Introduction suicidal self-injury. The interest of these behaviors is underscored by

Fig 1. Most frequent words per tag.

Fig 2. Words scores per tag.

Fig 4. ROC Curves Macro and Micro Average.

You might also like