You are on page 1of 42

Journal Pre-proof

Prospective prediction of suicide attempts in community adolescents


and young adults, using regression methods and machine learning

Miché Marcel PhD , Studerus Erich PhD ,


Meyer Andrea Hans PhD , Gloster Andrew Thomas PhD ,
Beesdo-Baum Katja PhD , Wittchen Hans-Ulrich PhD ,
Lieb Roselind PhD

PII: S0165-0327(19)31141-3
DOI: https://doi.org/10.1016/j.jad.2019.11.093
Reference: JAD 11340

To appear in: Journal of Affective Disorders

Received date: 3 May 2019


Revised date: 20 September 2019
Accepted date: 12 November 2019

Please cite this article as: Miché Marcel PhD , Studerus Erich PhD , Meyer Andrea Hans PhD ,
Gloster Andrew Thomas PhD , Beesdo-Baum Katja PhD , Wittchen Hans-Ulrich PhD ,
Lieb Roselind PhD , Prospective prediction of suicide attempts in community adolescents and
young adults, using regression methods and machine learning, Journal of Affective Disorders (2019),
doi: https://doi.org/10.1016/j.jad.2019.11.093

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition
of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of
record. This version will undergo additional copyediting, typesetting and review before it is published
in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal
disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier B.V.


Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 1

Highlights

 First study – to the best of our knowledge – to apply Machine Learning (ML)
alongside conventional prediction models to predict future suicide attempts, by
using data from a 10-year prospective longitudinal study.
 We used a community sample with ages 14-34 years (full study period) that
covers the high-risk period for the first lifetime suicide attempt, which according
to the WHO (2014) is between 15-29 years of age.
 We adhered to the TRIPOD guidelines (Collins et al., 2015) in order to increase
transparency and reproducibility, as well as to facilitate cross-study
comparisons.
 We adhered to further recommendations in order to meet current standards for
studies that apply ML, for instance, we used the best current approach for
internal cross-validation, as recommended by Krstajic et al. (2014).
 Our overall prediction performance of our selected models all fall in the category
“very good”, according to Šimundić (2009).
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 2

Prospective prediction of suicide attempts in community adolescents and young

adults, using regression methods and machine learning

Miché, Marcel, PhD1

Studerus, Erich, PhD2

Meyer, Andrea Hans, PhD1

Gloster, Andrew Thomas, PhD3

Beesdo-Baum, Katja, PhD4, 5

Wittchen, Hans-Ulrich, PhD5, 6

Lieb, Roselind, PhD1

1
University of Basel, Department of Psychology, Division of Clinical Psychology and Epidemiology,

Basel, Switzerland
2
University of Basel, Department of Psychology, Division of Personality and Developmental

Psychology, Basel, Switzerland


3
University of Basel, Department of Psychology, Division of Clinical Psychology and Intervention

Science, Basel, Switzerland


4
Technische Universitaet Dresden, Department of Behavioral Epidemiology, Dresden, Germany
5
Technische Universitaet Dresden, Institute of Clinical Psychology and Psychotherapy, Dresden,

Germany
6
Ludwig Maximilians University Munich, Department of Psychiatry and Psychotherapy, Munich,

Germany

Corresponding Author
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 3

Roselind Lieb, PhD

Division of Clinical Psychology and Epidemiology

Department of Psychology

University of Basel

Missionsstrasse 60-62

4055 Basel

Switzerland

Phone: 0041-61-2070278

Email: roselind.lieb@unibas.ch

Acknowledgments

This work is part of the Early Developmental Stages of Psychopathology (EDSP) Study and is

funded by the German Federal Ministry of Education and Research (BMBF) project nos.

01EB9405/6, 01EB9901/6, EB01016200, 01EB0140, and 01EB0440. Part of the field work

and analyses were also additionally supported by Deutsche Forschungsgemeinschaft (DFG)

grants LA1148/1-1, WI2246/1-1, WI 709/7-1, and WI 709/8-1. Principal investigators are Dr.

Hans-Ulrich Wittchen and Dr. Roselind Lieb, who take responsibility for the integrity of the

study data. Core staff members of the EDSP group are Dr. Kirsten von Sydow, Dr. Gabriele

Lachner, Dr. Axel Perkonigg, Dr. Peter Schuster, Dr. Michael Höfler, Dipl.-Psych. Holger

Sonntag, Dr. Tanja Brückl, Dipl.-Psych. Elzbieta Garczynski, Dr. Barbara Isensee, Dr. Agnes

Nocon, Dr. Chris Nelson, Dipl.-Inf. Hildegard Pfister, Dr. Victoria Reed, Dipl.-Soz. Barbara

Spiegel, Dr. Andrea Schreier, Dr. Ursula Wunderlich, Dr. Petra Zimmermann, Dr. Katja

Beesdo-Baum, Dr. Antje Bittner, Dr. Silke Behrendt, and Dr. Susanne Knappe. Scientific

advisors are Dr. Jules Angst (Zurich), Dr. Jürgen Margraf (Basel), Dr. Günther Esser

(Potsdam), Dr. Kathleen Merikangas (NIMH, Bethesda), Dr. Ron Kessler (Harvard

University, Boston), and Dr. Jim van Os (Maastricht University).


Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 4

Dr. Katja Beesdo-Baum is currently funded by the BMBF (project nos. 01ER1303,

01ER1703).
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 5

Abstract

Background. The use of machine learning (ML) algorithms to study suicidality has recently

been recommended. Our aim was to explore whether ML approaches have the potential to

improve the prediction of suicide attempt (SA) risk. Using the epidemiological multiwave

prospective-longitudinal Early Developmental Stages of Psychopathology (EDSP) data set,

we compared four algorithms—logistic regression, lasso, ridge, and random forest—in

predicting a future SA in a community sample of adolescents and young adults.

Methods. The EDSP Study prospectively assessed, over the course of 10 years, adolescents

and young adults aged 14–24 years at baseline. Of 3021 subjects, 2797 were eligible for

prospective analyses because they participated in at least one of the three follow-up

assessments. Sixteen baseline predictors, all selected a priori from the literature, were used to

predict follow-up SAs. Model performance was assessed using repeated nested 10-fold cross-

validation. As the main measure of predictive performance we used the area under the curve

(AUC).

Results. The mean AUCs of the four predictive models, logistic regression, lasso, ridge, and

random forest, were 0.828, 0.826, 0.829, and 0.824, respectively.

Conclusions. Based on our comparison, each algorithm performed equally well in

distinguishing between a future SA case and a non-SA case in community adolescents and

young adults. When choosing an algorithm, different considerations, however, such as ease of

implementation, might in some instances lead to one algorithm being prioritized over another.

Further research and replication studies are required in this regard.

Keywords: Machine learning, future suicide attempt, prediction, adolescents and young adults,

community sample, prospective design


Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 6

Introduction

Suicide research has suggested many correlates and some risk factors for completed

suicide, suicide attempt (SA), and suicidal ideation. Nonetheless, according to a recent meta-

analysis, the ability to accurately predict SAs remains poor, rarely exceeding chance

prediction (Franklin et al., 2017). In an endeavor to increase accuracy of SA prediction,

machine learning (ML) algorithms have been recommended (Bentley et al., 2016; Franklin et

al., 2017; Walsh et al., 2018, 2017), in addition to the use of more traditional statistical

approaches, for example, multiple logistic regression (for a brief comparison of both

approaches see Bennett et al., 2019). One advantage of ML algorithms is that they can better

deal with the problem of ―overfitting.‖ Overfitting occurs where a statistical model fits well

with one data set, yet fails to accurately predict new observations, a problem for which the

ML framework provides several solutions, for example, adjusting the flexibility with which

the model will learn from the data in order to control the degree of overfitting (Krstajic et al.,

2014; Studerus et al., 2017; Yarkoni and Westfall, 2017).

In suicidality research, some studies that have applied ML have found that SA can be

predicted above chance level, for example for SA (Delgado-Gomez et al., 2012, 2011; Hettige

et al., 2017; Just et al., 2017; Mann et al., 2008; Passos et al., 2016; Simon et al., 2018; Walsh

et al., 2017) and for suicidal behavior (i.e., suicide and SA combined) (Barak-Corren et al.,

2017).

When dealing with categorical outcomes, prediction is often quantified using the area

under the receiver operating characteristic curve (AUC). Chance prediction is thereby defined

as an AUC of 0.5. Šimundić (2009) suggested five heuristic categories of AUC results that

she termed ―bad‖ (0.5–0.59), ―sufficient‖ (0.6–0.69), ―good‖ (0.7–0.79), ―very good‖ (0.8–

0.89), and ―excellent‖ (0.9–1.0). Walsh et al. (2017) achieved very good prediction accuracy

for a future SA among adult patients, using electronic health record data (EHR; AUC range

0.80–0.84). Furthermore, the random forest model yielded a better prognostic performance
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 7

than multiple logistic regression (AUC range 0.66–0.68; Walsh et al., 2017). Walsh et al.

(2018) replicated this finding in a sample of adolescent patients and controls, again using

EHR data, with the random forest model yielding AUCs of more than 0.8, while logistic

regression yielded AUCs of less than 0.7. In the National Comorbidity Survey (NCS), a

community study of 15- to 54-year-olds, Kessler et al. (2016) reported logistic regression

being outperformed by ML models across all reported depression-related outcomes, SA being

one of them (AUC: 0.70 vs. 0.76). Delgado-Gomez and colleagues (2012, 2011) also

compared SA prediction accuracies, applying both ML models, for example, support vector

machines (SVMs), and a traditional model, multiple linear regression, using questionnaire

data of almost 900 adults (admitted to an emergency department, inpatients, and blood

donors) in each of the two cross-sectional studies. In the first study, Delgado-Gomez et al.

(2011) reported that ML models outperformed the traditional model, for example, prediction

accuracy (with 100 being the best possible result) of SVM being 76.7 vs. 71.5 in the linear

regression model, whereas in the second study the ML models and the linear regression model

rendered comparable results (Delgado-Gomez et al., 2012). Other studies that reported an

overall measure of prediction performance with SA as outcome did not report any comparison

between ML and statistical models (Barak-Corren et al., 2017; Hettige et al., 2017; Mann et

al., 2008; Nock et al., 2018; Passos et al., 2016; Simon et al., 2018). While four of these other

SA prediction studies applied ML models only (AUCs ranging between 0.65 and 0.8; Barak-

Corren et al., 2017; Hettige et al., 2017; Mann et al., 2008; Passos et al., 2016), the other two

studies applied statistical models in combination with techniques (e.g., penalization,

replicated n-fold cross-validation) to control overfitting (AUCs being 0.85 [Simon et al.,

2018] and 0.93 [Nock et al., 2018]).

We are unaware of any study that investigated SA prediction accuracy by applying

ML models and/or techniques to control overfitting, and using prospective longitudinal data

from an epidemiological sample of adolescents and young adults (aged 14–24 years at
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 8

baseline). This age range can be regarded as a time of ―high risk‖ for incident SA; in fact

among 15- to 29-year-olds, suicide is the second leading cause of death (World Health

Organization [WHO], 2014). Thus the three properties, namely, prospective study design,

general community, and young age group, are important, both methodologically (e.g.,

temporally prospective vs. cross-sectional data analysis; Kraemer, 2010; Kraemer et al., 1997)

and practically. That is, in terms of testing the utility of ML approaches it is essential to

derive indicators that are able to help clinical decision makers, such as general practitioners or

pediatricians, better recognize the individual risk of a future SA (or suicide) as early as

possible. To explore the utility of ML approaches we examined the prediction accuracy of

four prediction approaches, namely, three regression-based models (logistic, lasso, and ridge),

and one ML model (random forest), using the data of the epidemiological Early

Developmental Stages of Psychopathology (EDSP) Study, which prospectively assessed

community adolescents and young adults over the course of 10 years.

Methods

Sample

In the EDSP Study, community adolescents and young adults were assessed up to four

times between 1995 and 2005. At baseline, participants were between 14 and 24 years of age.

The four assessments T0–T3 included sample sizes of, respectively, 3021 (T0, response =

70.9%), 1228 (T1, response = 88%, range 1.2–2.1 years after baseline), 2548 (T2, response =

84.3%, range 2.8–4.1 years after baseline), and 2210 (T3, response = 73.2%, range = 7.3–10.6

years after baseline). At baseline, T2, and T3, subjects from the full sample were assessed; at

T1 a subsample of those 14–17 years old at baseline was assessed. Subjects were selected

from the government registries of the greater Munich area, Germany; 14- to 15-year-olds

were sampled at twice the probability of 16- to 21-year-olds, whereas 22- to 24-year-olds

were sampled at half the probability. Sample weights were generated to account for this
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 9

sampling scheme. Further details of the EDSP Study methods, design, and sample

characteristics have been presented elsewhere (Beesdo-Baum et al., 2015; Lieb et al., 2000a;

Wittchen et al., 1998b). The EDSP project was reviewed by the Ethics Committee of the

Medical Faculty at the Dresden University of Technology. All participants provided informed

consent.

Selection and assessment of predictors

We selected 16 predictors. First, predictors were derived a priori from the research

literature on suicidality (Cha et al., 2018; Franklin et al., 2017; Miché et al., 2018; Nock et al.,

2008), as currently recommended for ML studies (e.g., Passos et al., 2016; Steyerberg, 2009).

Our literature-guided predictor selection was based on the broad risk and protective factor

categories presented in the extensive meta-analysis by Franklin et al. (2017) to ensure each of

our predictors maps onto one of these categories identified in the last 50 years of suicidality

research. Our predictors map onto the categories of demographics, cognitive abilities, family

history of psychopathology, general psychopathology, psychosis, prior self-injurious thoughts

or behaviors, social factors, and treatment history. Second, predictors were selected from the

EDSP baseline assessment only, in order to ensure the temporal order of predictors and the

outcome, that is, future SA (between T1 and T3). Third, we remained close to a recommended

event per variable (EPV) value of 10, that is, to have 10 cases per predictor (Studerus et al.,

2017) in order to avoid methodological shortcomings, such as unreliable predictor selection

(Mushkudiani et al., 2008). Since we observed 137 future SAs, our EPV was 8.5. It should be

noted, however, that high EPV values are not as important in penalized regression methods as

they are in other methods (Pavlou et al., 2016).

Of the 16 baseline predictors (in the following labeled with letters a–p), 10 were

assessed with the computer-assisted Munich-Composite International Diagnostic Interview

(DIA-X/M-CIDI; Wittchen and Pfister, 1997), a fully structured clinical interview for the

assessment of syndromes, symptoms, and mental disorders pertaining to the Diagnostic and
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 10

Statistical Manual of Mental Disorders (4th ed.; DSM-IV; American Psychiatric Association,

1994), along with various items of personal information. The DIA-X/M-CIDI has shown good

to excellent reliability (Wittchen et al., 1998a) and validity (Reed et al., 1998). The baseline

predictors assessed with the DIA-X/M-CIDI were (a) sex, (b) age, (c) education, (d) the

number of DSM-IV lifetime mental diagnoses (including panic disorder [PD], agoraphobia

with or without PD, social phobia, specific phobia, generalized anxiety disorder, post-

traumatic stress disorder, obsessive compulsive disorder, major depressive disorder [MDD],

dysthymia, any bipolar disorder, nicotine dependence, alcohol abuse or dependence, drug

abuse or dependence, pain disorder, and any eating disorder), (e) the number of lifetime

traumatic events (including war experience, physical attack, natural disaster, serious accident,

imprisonment/kept hostage/abduction, witness to someone else experiencing a traumatic

event), (f) rape or childhood sexual abuse (excluded from predictor (e)), (g) parental loss or

separation, (h) prior help seeking for any kind of psychological difficulty, and (i) parental

psychopathology (assessed via family history information provided by the offspring at

baseline; for its criterion-related validity, see Lieb et al., 2000b). The baseline predictor (j),

prior SA (lifetime), as well as the outcome, future SA (follow-up), was assessed in section E

of the DIA-X/M-CIDI. At baseline the SA question read: ―Have you ever attempted suicide?‖

At each follow-up (DIA-X/M-CIDI interval versions) it read: ―Since our last interview, have

you attempted suicide?‖ At both baseline and T1, only those participants who had confirmed

at least one of the MDD stem questions were asked the SA question (unavailable baseline

data on lifetime SA was set to ―no SA‖), whereas at both T2 and T3, all participants were

asked the SA question (T2: lifetime, T3: since last interview).

Additional predictors assessed at baseline were (k) behavioral inhibition (assessed

with the Retrospective Self-Report of Inhibition (RSRI); Reznick et al., 1992), (l) subclinical

psychotic experiences during the previous 7 days (assessed with the SCL-90-R; Derogatis et

al., 1973), (m) negative life events in the previous 5 years (assessed with the Munich Life
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 11

Event List; Maier-Diewald et al., 1983), (n) daily hassles in the previous 2 weeks (assessed

with the Daily Hassles Scale; Perkonigg and Wittchen, 1995a), whether the participant was

(o) living in a rural area (population density of 553 inhabitants per square mile) or in an

urban area (population density of 4061 inhabitants per square mile) (Spauwen et al., 2004),

and (p) subjectively perceived coping efficacy within the next 6 months (assessed with the

German Scale for Self-Control and Coping Skills; Perkonigg and Wittchen, 1995b; higher

scale values denote lower perceived coping efficacy).

Data analysis

The outcome predicted was a reported SA after baseline (binary: yes–no). We used

four prediction models: Logistic regression, lasso, ridge (both variants of logistic regression),

and random forest, a widely used ML algorithm (Fernández-Delgado et al., 2014).

All data-related procedures were done in the statistical software environment R,

version 3.3.3 (R Core Team, 2017). In the preprocessing of the data we excluded all cases

without any follow-up data (n = 224), or missing data (n = 4) in any predictor variable at

baseline, resulting in an N of 2793. Our chosen ML models could not deal with missing data

and since there were only four such cases, we did not see the need to apply imputation

methods, assuming that results would not be much different. The categories for the predictor

of education (low, middle, high, other) were modified by merging the categories low and

other, the latter representing a high-risk group of low educational attainment (endorsed by

2.7% of N = 2793). In our sample there were 137 future SA cases (weighted percentage =

4.9). For the application of all prediction models, we used the R package mlr (Machine

Learning in R; Bischl et al., 2016), which is a framework for ML experiments in R (R Core

Team, 2017).

Prediction models and performance measures

We selected a conventional logistic regression model (based on the full set of

predictors, yet testing for collinearity, with maximum absolute correlation between predictors
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 12

of 0.4 and a maximum variance inflation factor of 1.74) and two other models of the logistic

regression family, lasso and ridge, that include an additional parameter for penalizing factors

with low predictive contributions. The ML model we selected, random forest, belongs to the

family of ensemble classifiers. Random forests have been shown to make the best predictions

across diverse data sets in comparison to many other algorithms, for example, neural

networks (Fernández-Delgado et al., 2014). The single prediction models were computed by

mlr (Bischl et al., 2016), accessing the R-packages that were relevant for our analyses: For

logistic regression this was the R base package stats; for both lasso and ridge this was

LiblineaR (Helleputte, 2017), and for the random forest model this was the ranger package

(Wright and Ziegler, 2017).

The procedure of obtaining the final results in mlr (Bischl et al., 2016) consisted of the

following steps:

First, with the aim of having each prediction model weighting all 16 predictor

variables equally, we normalized them.

Second, in accordance with the Transparent Reporting of a Multivariable Prediction

Model for Individual Prognosis or Diagnosis (TRIPOD) statement (Collins et al., 2015), we

selected performance measures relating to both discrimination and calibration, with the

former measuring a model’s ability to accurately discriminate new outcome cases and the

latter measuring a model’s overall performance (combination of both discrimination and

calibration). We chose the AUC as the measure of discrimination, which summarizes the

ratios of the true positive rate (sensitivity) and the false positive rate (1–specificity), across all

possible thresholds of predicted probabilities (from 0 to 1), according to which each observed

case is assigned to the outcome class of either 0 (no event) or 1 (event). As the measure of

overall performance, we chose the scaled Brier score. The best model performance is denoted

by the highest scaled Brier score, which is conceptually similar to Pearson’s statistic

(Steyerberg et al., 2010). Calibration denotes one particular aspect of a prediction model’s
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 13

accuracy, namely the agreement of predicted SA risk and actually observed SA rates (Alba et

al., 2017; Steyerberg, 2009; Studerus et al., 2017). Due to limitations of the AUC in

imbalanced datasets (e.g., Lobo et al., 2008), where the outcome group is much smaller than

the non-outcome group, we additionally report two other important performance metrics:

sensitivity (in ML termed recall) and positive predictive value (PPV; in ML termed

precision). Whereas sensitivity describes the proportion of those the model classifies as

having the outcome (testing positive) among those who actually have the outcome, PPV

describes the proportion of those who actually have the outcome among those who tested

positive. Values for both sensitivity and PPV can range between 0 (worst) and 1 (best).

Third, as a means to avoid performance estimates that are optimistically biased

(overfitting), we applied repeated nested cross-validation, which is the recommended method

of choice (Krstajic et al., 2014), whenever the gold-standard, external validation, cannot be

applied (Bleeker et al., 2003). Internal cross-validation includes the strict separation of a

given data set into a training data set, used to build a prediction model, and a test data set,

used to validate the model (Steyerberg, 2009; Studerus et al., 2017). Repeated nested cross-

validation is a two-stage process. At stage 1, the selected hyperparameters of the model are

tuned, such that the model’s performance is optimized, as measured on a validation data set.

Hyperparameters are different from the standard model parameters (e.g., weights in a

regression model) in that they do not represent the learning from the data itself but instead

define higher level properties of the model, which cannot be learned from the data. Tuning of

hyperparameters means specifying how the model will learn from the data, for example, the

degree of model complexity, for which we used an automated grid search with 10 different

hyperparameter values. For the lasso and ridge regression models we chose to tune the

parameter cost (cost of constraints violation) in the range of 0.001 and 0.3. For the random

forest model we chose to tune the parameter mtry (number of variables randomly sampled as

candidates at each split) in the range of 1 and 16, while keeping the parameter ntree (number
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 14

of trees to grow) at its default value of 500, because tuning this parameter is generally not

recommended (Probst and Boulesteix, 2018). For selecting the best tuning-based prediction

model, we used stratified out-of-bag bootstrapping with 50 iterations (stratification being

useful for imbalanced class sizes). Bootstrapping with iterations generates multiple samples

from and of the same size as the original data set. The training of the model uses the sampled

cases (for each of the 10 hyperparameter values), after which the model is validated on the so-

called out-of-bag data, which has not been used for model building in the respective bootstrap

iteration. Bootstrapping is a recommended choice when selecting a prediction model, because

overfitting is strongly avoided (Kuhn and Johnson, 2013, p. 78). At stage 2 of the repeated

nested cross-validation, the optimal prediction model of stage 1 is used, with the aim of

estimating this model’s final prediction performance, for which we used 10-fold repeated

cross-validation, with 10 repetitions per fold. Repeated n-fold cross-validation is

recommended for several reasons, for example, to obtain robust estimates of model

performance (Kuhn and Johnson, 2013, p. 78). With this setup we obtained 100 estimates of

prediction performances for each model.

Results

Predictive performance measures

Means and medians of AUC and scaled Brier score of all four models are shown in

Table 1. AUC values were very similar among the four models for both mean (0.824–0.829)

and median (0.822–0.830), with strongly overlapping boxplots (Fig. 1). The scaled Brier

score was highest for the ridge model (mean: 0.466, median: 0.461) while the values of the

other three models ranged between 0.136 and 0.245 (mean) and 0.167 and 0.246 (median).

Sensitivity and positive predictive value

Mean sensitivity and positive predictive value (PPV), each based on a predicted
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 15

probability cutoff of 0.5, are shown in table 1. Mean sensitivity ranged from 2.8% (random

forest) to 25% (ridge regression), whereas both logistic and lasso regression showed similar

sensitivities of around 22%. PPV among the logistic regression family fell into a close range

of between 66% and 72%, whereas the random forest achieved a PPV of 87%.

- Figure 1 here -

- Table 1 here -

Predictor importance

Predictor importance values for each model are summarized in Table 2. In all four

prediction models, the most important predictor was prior SA. In the logistic-regression-based

models, it increased the odds of a future SA by 57% (logistic), 55% (lasso), and 14% (ridge).

All following ranks, that is, ranks 2 to 16, were not consistent across all four models. Whereas

education ranked second in the logistic and lasso models (33% and 30% risk decrease,

respectively), prior help seeking ranked second in the ridge model (5% risk increase), and

number of DSM-IV lifetime mental disorders ranked second in the random forest model. Prior

help seeking ranked third in all models except for the ridge model, showing a risk increase for

a future SA of around 30% (logistic and lasso models). In the ridge model the number of

DSM-IV lifetime mental disorders ranked third, with a risk increase of 4%. Negative life

events and psychotic experiences were discarded by the lasso model, indicating that these two

variables were not useful in predicting the risk of future SAs.

Regarding the overall predictor importance ranking, the logistic and lasso regression models

showed a 44% concordance, that is, 7 of 16 predictors had the exact same rank in both

models. Rank concordance ranged between 6% and 12% for all other possible comparisons of

two models. When permitting ranks per predictor to differ by a maximum of 1 between two

models, rank concordance increased to 100% when comparing logistic and lasso regression,

while the range of concordant ranks increased to 19% and 38%. All three regression-based
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 16

models assigned similarly high ranks to the predictors parental loss or separation and

behavioral inhibition, respectively (ranking between fifth and seventh).

- Table 2 here -

Discussion

All four prediction models, that is, logistic regression, lasso, ridge, and random forest,

yielded comparable prediction accuracies. According to categories of AUC results, our results

(median AUC ranging between 0.822 and 0.830) represent a very good prediction (Šimundić,

2009). In terms of Cohen’s d, our AUC results can be translated to an effect size of about 1.3

(Rice and Harris, 2005). When comparing the discriminative ability of our prediction models

with other studies predicting SA on the individual level, our results fit into the upper part of

the AUC range of 0.65–0.93 across these studies (Hettige et al., 2017; Kessler et al., 2016;

Mann et al., 2008; Passos et al., 2016; Simon et al., 2018; Walsh et al., 2018, 2017). However,

we refrain from comparisons with most of these studies, because of the fundamental

differences between them and our study, for instance, in terms of sample type (mostly patients

or army soldiers vs. community), sample size, study design (mostly cross-sectional or

electronic health record data vs. prospectively assessed data), and age group (almost

exclusively adults vs. adolescents/young adults). The only exception in terms of

comparability is the NCS study by Kessler et al. (2016) who also used a representative

community sample to prospectively predict SAs. However, the sample they used consisted of

a subsample of 1056 respondents (age range reported only for the full sample) with a DSM-

III-R (American Psychiatric Association, 1987) lifetime MDD diagnosis at baseline (1990-

1992), who were reinterviewed once 10–12 years after baseline. SA was reported by 4.5% of

those respondents. Whereas the ML models contained between 9 and 13 predictors, logistic

models contained 23 predictors.


Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 17

Several possible reasons might explain the difference between Kessler et al.’s (2016)

results for SA (AUC: 0.70 by logistic models, 0.76 by ML models) and our results for SA

(AUC: around 0.82 by both logistic models and the ML model). First and foremost, Kessler

et al. (2016) used prediction models that were developed using the baseline data (van Loo et

al., 2014; Wardenaar et al., 2014), and then applied these models independently to the follow-

up data. Other possible explanations for differing results might be sample source (NCS: MDD

diagnosis vs. EDSP Study: general community), diagnostic criteria (DSM-III-R vs. DSM-IV),

age range at baseline (15–54 years vs. 14–24 years), number of assessment waves within the

respective study period (two in 10–12 years vs. a maximum of four in 10 years), and number

of predictors used in both the logistic models and the ML models (23 for logistic and 9–13 for

ML vs. 16 in both logistic and ML). Notably, Kessler et al. (2016) did not use prior SA as one

of the predictors, which turned out to be the most important predictor across all of our

prediction models. This, too, might explain differences between results.

Unlike the AUC, the scaled Brier score does not come with recommended cut-off

categories. We can therefore only descriptively note that the ridge regression performed best

in terms of the scaled Brier score (combination of prediction accuracy and calibration),

whereas the other three models performed less well, with a 47% to 71% reduced scaled Brier

score. Interestingly, even though the ridge model showed no particularly increased AUC

values (see Fig. 1, left panel), the scaled Brier score markedly differed from the other models,

in terms of both the median and the variability (see Fig. 1, right panel).

The question arises as to why some studies reported ML models outperforming

conventional logistic or linear regression models (Delgado-Gomez et al., 2011; Kessler et al.,

2016; Walsh et al., 2018, 2017), whereas other studies (SA: Delgado-Gomez et al., 2012;

suicide: Kessler et al., 2017, 2015), including ours, reported comparable prediction

performances. Several suggestions can be found in the literature. The advantages of ML

depend on several data-related properties, for example, on sample size (Hahn et al., 2017)
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 18

(ML prefers ―big data‖), on high-dimensional complexity (e.g., nonlinear associations, high-

order interactions) actually being present in the data (Walsh et al., 2018), on predictor sets

that contain different data types and sources (Lee et al., 2018), and, according to Walsh et al.

(2018), on how difficult group differences are to detect, which might be more difficult in two

relatively homogeneous groups (e.g., suicide ideators with vs. without SA) than in

heterogeneous groups (e.g., general community members with vs. without SA). Another

explanation for comparable prediction performances across different models might be

whether there is a sufficient number of outcome cases per predictor (the EPV

recommendation is 10; Studerus et al., 2017). On the one hand, the above-mentioned studies

that found ML to outperform logistic or linear regression (Delgado-Gomez et al., 2011;

Kessler et al., 2016; Walsh et al., 2018, 2017), used patient or MDD-diagnosis samples of

various sizes (Ns ranging between 879 and over 33000), which additionally fulfill some of the

other criteria that the ML approach seems to favor. However, there are two studies by Kessler

et al. (2017, 2015) on U.S. army soldiers, both using suicide as outcome, in either individuals

hospitalized with mental disorders or psychiatric outpatients. In both studies conventional

regression models performed equally high, compared to ML models, despite the large sample

sizes (between 40000 and 975000), despite a presumably high complexity in the actual data,

despite predictor sets of different data types and sources, and despite the homogeneity of the

samples, which might have made it somewhat difficult to detect group differences. Notably,

ML models were used, both to predict the outcome and to select a lower number of relevant

predictors, which then were used in discrete-time survival (Kessler et al., 2015) or logistic

regression (Kessler et al., 2017) models. Nonetheless, the overall prediction performance was

comparably high between conventional regression and ML models (0.84 vs. 0.85 (Kessler et

al., 2015) and 0.72 vs. 0.72 (Kessler et al., 2017)). Therefore, our study results might not be

fully explained by the above-mentioned criteria that favor the use of ML, which are not

completely met by the EDSP data. Of note, a current systematic review by Christodoulou et
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 19

al. (2019) found no performance benefit of ML over logistic regression for clinical prediction

models in 71 studies across several epidemiological research fields, such as psychiatry,

cardiology, or oncology. Similarly, Belsher et al. (2019) conclude that ML models currently

are not ready for clinical applications across health systems concerning SA and suicide

deaths, due to several critical concerns that in their view have remained unaddressed.

In addition to our main performance metric AUC, we also calculated both sensitivity

and PPV. All of these performance metrics may make most sense in combination, since each

captures a specific aspect of model performance. While the AUC is recommended by some

authors as a global model performance metric (e.g., Bradley, 1997), others acknowledge its

widespread use (Saito and Rehmsmeier, 2014), and yet others call for it to be abandoned or

replaced (Lobo et al., 2008; Wald and Bestwick, 2014). However, to date the AUC still seems

to be useful for comparing model performances across studies, which in our view is

somewhat less the case with sensitivity and the PPV. Unlike the AUC, sensitivity is not a

global measure applicable across all possible thresholds of predicted probabilities, but it is a

local measure for one specific threshold. The PPV depends on the outcome base rate, whereas

the AUC does not (Hajian-Tilaki, 2013), which makes comparison across studies difficult to

the degree that base rates differ. When applying both sensitivity and the PPV to compare our

models with each other, the approximate model performance equality (in terms of the AUC)

disappears. Instead, only the logistic regression family performs in a close range, with

sensitivities (for a predicted probabilities threshold of 0.5) being relatively low between 20%

and 25%, and PPVs (for the average outcome rate of about 5%) being fairly high between

66% and 72%. The random forest model, on the other hand, shows an extremely low

sensitivity of 3%, yet the highest PPV of 87% across all four models. We emphasize that the

AUC and measures such as sensitivity and PPV evaluate model performance very differently.

One important aspect that must not be neglected is the context in which one of these measures

is more appropriate than another. For instance, in a model comparison study such as this one,
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 20

the AUC is more appropriate since it captures overall model performance, whereas when it

comes to the clinical application of the model, then finding and setting a probability threshold

(to balance specificity and sensitivity) by applying a loss/utility/cost function that depends on

several contextual considerations is more appropriate.

Many ML models are considered black boxes (Gilpin et al., 2018); that is, even though

the importance of the predictors can be extracted from a model, the self-learning algorithm

might have used the predictors for computing the outcome in such a way that human beings

are not able to comprehend it, for example, 10th-order interaction. The random forest model

selected constructs as most important predictors that differed from those of the logistic

regression models. Even within the logistic regression models there were some differences

(see Table 2, e.g., logistic and ridge). This poses the difficult question of which predictor

selection mechanisms to ―trust‖ when trying to interpret the results. Irrespective of this issue,

it is interesting to note that prior SA was the most important predictor across all models,

confirming this variable’s reputation as supplying the highest predictive power for a

subsequent SA (Borges et al., 2010, 2006; Brown et al., 2000; Glenn and Nock, 2014; Joiner

et al., 2005; Kuo et al., 2001; Nordström et al., 1995; Ribeiro et al., 2016; WHO, 2014). In

particular, we would emphasize that we compared the predictors’ rank across models, so the

magnitude of the coefficients should not be compared between nonpenalized and penalized

logistic regression on account of the coefficients being regularized (biased) in the latter case.

The second most important predictor was educational level in the logistic and lasso models.

This confirms the plausibility of this variable as being protective against SA, for example, in

that higher educational achievement in adolescence is associated with greater life satisfaction

(Crede et al., 2015). In the ridge model, prior psychological help seeking was selected as

second most important predictor, whereas it ranked third in the random forest, logistic and

lasso models, respectively. Prior psychological help seeking might thus be seen as indicating

a greater severity of psychological problem(s) or disorder(s) present at that time (Han et al.,
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 21

2018; Hom et al., 2015), which might serve as one possible explanation for the positive

association with SA. Finally, the number of prior mental disorders (comorbidity) has often

been found to be associated with SA (e.g., Bronisch and Wittchen, 1994; Lewinsohn et al.,

1995; Miché et al., 2018), which is confirmed by the random forest and the ridge regression

models ranking this predictor second and third respectively.

We want to mention several strengths of our study. First, to the best of our knowledge

this is the first study that applied ML procedures to prospectively predict SAs in community

adolescents and young adults (an assumption being supported by a current systematic review

on the use of ML in the study of suicidal behaviors; Burke et al., 2019), a group that is known

to be the high-risk group for first lifetime SA (WHO, 2014). Second, we used repeated nested

cross-validation, which Krstajic et al. (2014) recommended as the best approach for training

and testing a prediction model within a single dataset, that is, external validation being

inapplicable. Third, we adhered to the reporting guidelines known as the TRIPOD statement

(Collins et al., 2015). This strength is also supported by two systematic reviews (Burke et al.,

2019; Christodoulou et al., 2019), who criticize the inconsistent reporting methods of

classifier performance across studies. Fourth, we used predictors that were a priori defined,

taken from the suicide literature. We assume that this and the EDSP data quality might have

led to the very good (Šimundić, 2009) discriminative ability of the predictive models we

applied.

There are also limitations of our study. First, the predictive performance of ML

algorithms such as random forests depends on the sample size, with larger sample sizes at

times leading to an increased performance result (Raudys and Jain, 1991). In that respect our

sample size may be considered a weakness. It may also be argued that it is not sample size per

se which matters, but rather the relationship between predictor and outcome in the data, that is

to say, whether additive or multiplicative (interaction). In the case of an additive association,

ML techniques such as random forest may simply not be able to show their predictive
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 22

potential, as opposed to a robust multiplicative association. Second, our 10-year longitudinal

study design enabled us to include predictors that must be conceived as distal, as opposed to

proximal. Future research on predicting individual SA risk should include both distal and

proximal risk factors, since the main purpose of predictive analytics is to offer tools for risk

assessment in the near future, rather than in the distant future. Third, we used self-reported

data, which is critical in terms of several inherent biases, for example, recall bias. It also

means that we lacked other than self-report data, e.g., genetic or neuropsychological data.

Fourth, we did not apply external cross-validation, which is considered the gold-standard in

estimating the degree of overfitting and which might have yielded lower model performances

compared to our cross-validation procedure. Fifth, our outcome was assessed with a one-item

measure, which might have led to an increased misclassification rate, estimated by Millner et

al. (2015) to be 11%. However, this possible error rate must not be overstated either. Mazza et

al. (2011) empirically support the notion that single-item SA responses appear to be valid.

Sixth, there might have been undetected SA cases at T1, depending on whether participants

entered the MDD interview section. However, we consider this a minor limitation because T1

was the only one of the four EDSP waves where a subsample was assessed.

Despite these limitations, our study has shown that all four models resulted in a very

good overall ability to discriminate between individuals who attempt suicide in the future

from individuals who do not, in a high-risk sample of community adolescents and young

adults. This might be seen as a promising contribution to the ongoing pursuit of fruitfully

combining both statistical methods and ML methods, aiming to improve SA risk assessment

of individuals. One possible clinical implication of ML studies using survey-based self-report

data from the general community might be to use the best model or combination of models as

a screening tool for SA in primary prevention efforts.

Author declaration
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 23

We wish to confirm that there are no known conflicts of interest associated with this

publication and there has been no significant financial support for this work that could have

influenced its outcome.

We confirm that the manuscript has been read and approved by all named authors and that

there are no other persons who satisfied the criteria for authorship but are not listed. We

further confirm that the order of authors listed in the manuscript has been approved by all of

us.

We confirm that we have given due consideration to the protection of intellectual property

associated with this work and that there are no impediments to publication, including the

timing of publication, with respect to intellectual property. In so doing we confirm that we

have followed the regulations of our institutions concerning intellectual property.

We understand that the Corresponding Author is the sole contact for the Editorial process

(including Editorial Manager and direct communications with the office). He/she is

responsible for communicating with the other authors about progress, submissions of

revisions and final approval of proofs. We confirm that we have provided a current, correct

email address which is accessible by the Corresponding Author and which has been

configured to accept email from roselind.lieb@unibas.ch.

Brief statement concerning each named author's contributions to the paper under the heading
Contributors:
Author Marcel Miché did the literature searches, undertook the statistical analyses, and wrote
the first draft of the manuscript.
Author Erich Studerus reviewed the statistical analyses and the reporting of our study results.
Author Andrea Meyer reviewed methodological parts of the manuscript.
Author Andrew Gloster reviewed the manuscript.
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 24

Author Katja Beesdo-Baum reviewed the manuscript.


Author Hans-Ulrich Wittchen is one of two principal investigators of the EDSP study and
reviewed the manuscript.
Author Roselind Lieb is the other principal investigator of the EDSP study. She reviewed the
manuscript and is the corresponding author.
All authors contributed to and have approved the final manuscript.
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 25

Conflict of Interest

Declarations of interest:

none.
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 26

References

Alba, A.C., Agoritsas, T., Walsh, M., Hanna, S., Iorio, A., Devereaux, P.J., McGinn, T.,

Guyatt, G., 2017. Discrimination and Calibration of Clinical Prediction Models:

Users’ Guides to the Medical Literature. JAMA Psychiatry 318, 1377–1348.

https://doi.org/10.1001/jama.2017.12126

American Psychiatric Association, 1994. Diagnostic and statistical manual of mental

disorders (4th ed.). Author: Washington, DC.

American Psychiatric Association, 1987. Diagnostic and statistical manual of mental

disorders (3rd ed., revised). Author: Washington, DC.

Barak-Corren, Y., Castro, V.M., Javitt, S., Hoffnagle, A.G., Dai, Y., Perlis, R.H., Nock, M.K.,

Smoller, J.W., Reis, B.Y., 2017. Predicting Suicidal Behavior From Longitudinal

Electronic Health Records. Am. J. Psychiatry 174, 154–162.

https://doi.org/10.1176/appi.ajp.2016.16010077

Beesdo-Baum, K., Knappe, S., Asselmann, E., Zimmermann, P., Bruckl, T., Hofler, M.,

Behrendt, S., Lieb, R., Wittchen, H.U., 2015. The ―Early Developmental Stages of

Psychopathology (EDSP) Study‖: a 20-year review of methods and findings. Soc.

Psychiatry Psychiatr. Epidemiol. 50, 851–866. https://doi.org/10.1007/s00127-015-

1062-x

Belsher, B.E., Smolenski, D.J., Pruitt, L.D., Bush, N.E., Beech, E.H., Workman, D.E.,

Morgan, R.L., Evatt, D.P., Tucker, J., Skopp, N.A., 2019. Prediction Models for

Suicide Attempts and Deaths: A Systematic Review and Simulation. JAMA

Psychiatry. https://doi.org/10.1001/jamapsychiatry.2019.0174
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 27

Bennett, D., Silverstein, S.M., Niv, Y., 2019. The Two Cultures of Computational Psychiatry.

JAMA Psychiatry. https://doi.org/10.1001/jamapsychiatry.2019.0231

Bentley, K.H., Franklin, J.C., Ribeiro, J.D., Kleiman, E.M., Fox, K.R., Nock, M.K., 2016.

Anxiety and its disorders as risk factors for suicidal thoughts and behaviors: A meta-

analytic review. Clin. Psychol. Rev. 43, 30–46.

https://doi.org/10.1016/j.cpr.2015.11.008

Bischl, B., Lang, M., Kotthoff , L., Schiff ner, J., Richter, J., Studerus, E., Casalicchio, G.,

Jones, Z.M., 2016. mlr: Machine Learning in R. J. Mach. Learn. Res. 17, 1–5.

Bleeker, S.E., Moll, H.A., Steyerberg, E.W., Donders, A.R.T., Derksen-Lubsen, G., Grobbee,

D.E., Moons, K.G.M., 2003. External validation is necessary in prediction research: J.

Clin. Epidemiol. 56, 826–832. https://doi.org/10.1016/S0895-4356(03)00207-5

Borges, G., Angst, J., Nock, M.K., Ruscio, A.M., Walters, E.E., Kessler, R.C., 2006. A risk

index for 12-month suicide attempts in the National Comorbidity Survey Replication

(NCS-R). Psychol. Med. 36, 1747–1757. https://doi.org/10.1017/S0033291706008786

Borges, G., Nock, M.K., Haro Abad, J.M., Hwang, I., Sampson, N.A., Alonso, J., Andrade,

L.H., Angermeyer, M.C., Beautrais, A., Bromet, E., Bruffaerts, R., de Girolamo, G.,

Florescu, S., Gureje, O., Hu, C., Karam, E.G., Kovess-Masfety, V., Lee, S., Levinson,

D., Medina-Mora, M.E., Ormel, J., Posada-Villa, J., Sagar, R., Tomov, T., Uda, H.,

Williams, D.R., Kessler, R.C., 2010. Twelve-Month Prevalence of and Risk Factors

for Suicide Attempts in the World Health Organization World Mental Health Surveys.

J. Clin. Psychiatry 71, 1617–1628. https://doi.org/10.4088/JCP.08m04967blu

Bradley, A.P., 1997. The use of the area under the ROC curve in the evaluation of machine

learning algorithms. Pattern Recognit. 30, 1145–1159. https://doi.org/10.1016/S0031-


Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 28

3203(96)00142-2

Bronisch, T., Wittchen, H.U., 1994. Suicidal Ideation and Suicide Attempts - Comorbidity

with Depression, Anxiety Disorders, and Substance-Abuse Disorder. Eur. Arch.

Psychiatry Clin. Neurosci. 244, 93–98. https://doi.org/Doi 10.1007/Bf02193525

Brown, G.K., Beck, A.T., Steer, R.A., Grisham, J.R., 2000. Risk factors for suicide in

psychiatric outpatients: A 20-year prospective study. J. Consult. Clin. Psychol. 68,

371–377. https://doi.org/10.1037/0022-006X.68.3.371

Burke, T.A., Ammerman, B.A., Jacobucci, R., 2019. The use of machine learning in the study

of suicidal and non-suicidal self-injurious thoughts and behaviors: A systematic

review. J. Affect. Disord. 245, 869–884. https://doi.org/10.1016/j.jad.2018.11.073

Cha, C.B., Franz, P.J., Guzmán, E.M., Glenn, C.R., Kleiman, E.M., Nock, M.K., 2018.

Annual Research Review: Suicide among youth - epidemiology, (potential) etiology,

and treatment. J. Child Psychol. Psychiatry 59, 460–482.

https://doi.org/10.1111/jcpp.12831

Christodoulou, E., Ma, J., Collins, G.S., Steyerberg, E.W., Verbakel, J.Y., van Calster, B.,

2019. A systematic review shows no performance benefit of machine learning over

logistic regression for clinical prediction models. J. Clin. Epidemiol.

https://doi.org/10.1016/j.jclinepi.2019.02.004

Collins, G.S., Reitsma, J.B., Altman, D.G., Moons, K., 2015. Transparent reporting of a

multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the

TRIPOD Statement. BMC Med. 13, 1. https://doi.org/10.1186/s12916-014-0241-z


Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 29

Crede, J., Wirthwein, L., McElvany, N., Steinmayr, R., 2015. Adolescents’ academic

achievement and life satisfaction: the role of parents’ education. Front. Psychol. 6, 52.

https://doi.org/10.3389/fpsyg.2015.00052

Delgado-Gomez, D., Blasco-Fontecilla, H., Alegria, A.A., Legido-Gil, T., Artes-Rodriguez,

A., Baca-Garcia, E., 2011. Improving the accuracy of suicide attempter classification.

Artif. Intell. Med. 52, 165–168. https://doi.org/10.1016/j.artmed.2011.05.004

Delgado-Gomez, D., Blasco-Fontecilla, H., Sukno, F., Socorro Ramos-Plasencia, M., Baca-

Garcia, E., 2012. Suicide attempters classification: Toward predictive models of

suicidal behavior. Neurocomputing 92, 3–8.

https://doi.org/10.1016/j.neucom.2011.08.033

Derogatis, L.R., Lipman, R.S., Covi, L., 1973. The SCL-90-R: An outpatient psychiatric

rating scale: Preliminary report. Deutsche Bearbeitung CIPS. Psychopharmacol. Bull.

9, 13–27.

Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D., 2014. Do we Need Hundreds of

Classifiers to Solve Real World Classification Problems? J. Mach. Learn. Res. 15,

3133–3181.

Franklin, J.C., Ribeiro, J.D., Fox, K.R., Bentley, K.H., Kleiman, E.M., Huang, X., Musacchio,

K.M., Jaroszewski, A.C., Chang, B.P., Nock, M.K., 2017. Risk factors for suicidal

thoughts and behaviors: A meta-analysis of 50 years of research. Psychol. Bull. 143,

187–232. https://doi.org/10.1037/bul0000084

Gilpin, L.H., Bau, D., Yuan, B.Z., Bajwa, A., Specter, M., Kagal, L., 2018. Explaining

Explanations: An Approach to Evaluating Interpretability of Machine Learning. ArXiv

Prepr. arXiv, 1806.00069.


Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 30

Glenn, C.R., Nock, M.K., 2014. Improving the Short-Term Prediction of Suicidal Behavior.

Am. J. Prev. Med. 47, S176–S180. https://doi.org/10.1016/j.amepre.2014.06.004

Hahn, T., Nierenberg, A.A., Whitfield-Gabrieli, S., 2017. Predictive analytics in mental

health: applications, guidelines, challenges and perspectives. Mol. Psychiatry 22, 37–

43. https://doi.org/10.1038/mp.2016.201

Hajian-Tilaki, K., 2013. Receiver Operating Characteristic (ROC) Curve Analysis for

Medical Diagnostic Test Evaluation. Casp. J. Intern. Med. 4, 627–635.

Han, J., Batterham, P.J., Calear, A.L., Randall, R., 2018. Factors Influencing Professional

Help-Seeking for Suicidality: A Systematic Review. Crisis 39, 175–196.

https://doi.org/10.1027/0227-5910/a000485

Helleputte, T., 2017. LiblineaR: Linear Predictive Models Based on the LIBLINEAR C/C++

Library.

Hettige, N.C., Nguyen, T.B., Yuan, C., Rajakulendran, T., Baddour, J., Bhagwat, N., Bani-

Fatemi, A., Voineskos, A.N., Mallar Chakravarty, M., De Luca, V., 2017.

Classification of suicide attempters in schizophrenia using sociocultural and clinical

features: A machine learning approach. Gen. Hosp. Psychiatry 47, 20–28.

https://doi.org/10.1016/j.genhosppsych.2017.03.001

Hom, M.A., Stanley, I.H., Joiner, T.E., 2015. Evaluating factors and interventions that

influence help-seeking and mental health service utilization among suicidal

individuals: A review of the literature. Clin. Psychol. Rev. 40, 28–39.

https://doi.org/10.1016/j.cpr.2015.05.006

Joiner, T.E., Conwell, Y., Fitzpatrick, K.K., Witte, T.K., Schmidt, N.B., Berlim, M.T., Fleck,
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 31

M.P.A., Rudd, M.D., 2005. Four Studies on How Past and Current Suicidality Relate

Even When “Everything But the Kitchen Sink” Is Covaried. J. Abnorm. Psychol. 114,

291–303. https://doi.org/10.1037/0021-843X.114.2.291

Just, M.A., Pan, L., Cherkassky, V.L., McMakin, D.L., Cha, C., Nock, M.K., Brent, D., 2017.

Machine learning of neural representations of suicide and emotion concepts identifies

suicidal youth. Nat. Hum. Behav. 1, 911–919. https://doi.org/10.1038/s41562-017-

0234-y

Kessler, R.C., Stein, M.B., Petukhova, M.V., Bliese, P., Bossarte, R.M., Bromet, E.J.,

Fullerton, C.S., Gilman, S.E., Ivany, C., Lewandowski-Romps, L., Millikan Bell, A.,

Naifeh, J.A., Nock, M.K., Reis, B.Y., Rosellini, A.J., Sampson, N.A., Zaslavsky,

A.M., Ursano, R.J., 2017. Predicting suicides after outpatient mental health visits in

the Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS).

Mol. Psychiatry 22, 544–551. https://doi.org/10.1038/mp.2016.110

Kessler, R.C., van Loo, H.M., Wardenaar, K.J., Bossarte, R.M., Brenner, L.A., Cai, T., Ebert,

D.D., Hwang, I., Li, J., de Jonge, P., Nierenberg, A.A., Petukhova, M.V., Rosellini,

A.J., Sampson, N.A., Schoevers, R.A., Wilcox, M.A., Zaslavsky, A.M., 2016. Testing

a machine-learning algorithm to predict the persistence and severity of major

depressive disorder from baseline self-reports. Mol. Psychiatry 21, 1366–1371.

https://doi.org/10.1038/mp.2015.198

Kessler, R.C., Warner, C.H., Ivany, C., Petukhova, M.V., Rose, S., Bromet, E.J., Brown, M.,

Cai, T., Colpe, L.J., Cox, K.L., Fullerton, C.S., Gilman, S.E., Gruber, M.J., Heeringa,

S.G., Lewandowski-Romps, L., Li, J., Millikan-Bell, A.M., Naifeh, J.A., Nock, M.K.,

Rosellini, A.J., Sampson, N.A., Schoenbaum, M., Stein, M.B., Wessely, S., Zaslavsky,

A.M., Ursano, R.J., 2015. Predicting Suicides After Psychiatric Hospitalization in US


Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 32

Army Soldiers: The Army Study to Assess Risk and Resilience in Servicemembers

(Army STARRS). JAMA Psychiatry 72, 49–57.

https://doi.org/10.1001/jamapsychiatry.2014.1754

Kraemer, H.C., 2010. Epidemiological Methods: About Time. Int. J. Environ. Res. Public.

Health 7, 29–45. https://doi.org/10.3390/ijerph7010029

Kraemer, H.C., Kazdin, A.E., Offord, D.R., Kessler, R.C., Jensen, P.S., Kupfer, D.J., 1997.

Coming to terms with the terms of risk. Arch Gen Psychiatry 54, 337–343.

Krstajic, D., Buturovic, L.J., Leahy, D.E., Thomas, S., 2014. Cross-validation pitfalls when

selecting and assessing regression and classification models. J. Cheminformatics 6, 10.

https://doi.org/10.1186/1758-2946-6-10

Kuhn, M., Johnson, K., 2013. Applied Predictive Modeling, 5th ed. Springer, New York.

Kuo, W.-H., Gallo, J.J., Tien, A.Y., 2001. Incidence of suicide ideation and attempts in

adults: the 13-year follow-up of a community sample in Baltimore, Maryland.

Psychol. Med. 31, 1181–1191. https://doi.org/10.1017/S0033291701004482

Lee, Y., Ragguett, R.-M., Mansur, R.B., Boutilier, J.J., Rosenblat, J.D., Trevizol, A.,

Brietzke, E., Lin, K., Pan, Z., Subramaniapillai, M., Chan, T.C.Y., Fus, D., Park, C.,

Musial, N., Zuckerman, H., Chen, V.C.-H., Ho, R., Rong, C., McIntyre, R.S., 2018.

Applications of machine learning algorithms to predict therapeutic outcomes in

depression: A meta-analysis and systematic review. J. Affect. Disord. 241, 519–532.

https://doi.org/10.1016/j.jad.2018.08.073

Lewinsohn, P.M., Rohde, P., Seeley, J.R., 1995. Adolescent Psychopathology: III. The

Clinical Consequences of Comorbidity. J. Am. Acad. Child Adolesc. Psychiatry 34,


Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 33

510–519. https://doi.org/Doi 10.1097/00004583-199504000-00018

Lieb, R, Isensee, B., von Sydow, K., Wittchen, H.U., 2000a. The Early Developmental Stages

of Psychopathology Study (EDSP): A methodological update. Eur. Addict. Res. 6,

170–182. https://doi.org/Doi 10.1159/000052043

Lieb, Roselind, Wittchen, H.-U., Höfler, M., Fuetsch, M., Stein, M.B., Merikangas, K.R.,

2000b. Parental Psychopathology, Parenting Styles, and the Risk of Social Phobia in

Offspring: A Prospective-Longitudinal Community Study. Arch. Gen. Psychiatry 57,

859–866. https://doi.org/10.1001/archpsyc.57.9.859

Lobo, J.M., Jiménez-Valverde, A., Real, R., 2008. AUC: a misleading measure of the

performance of predictive distribution models. Glob. Ecol. Biogeogr. 17, 145–151.

https://doi.org/10.1111/j.1466-8238.2007.00358.x

Maier-Diewald, W., Wittchen, H.-U., Hecht, H., Werner-Eilert, K., 1983. Die Münchner

Ereignisliste (MEL)–Anwendungsmanual. Max Planck Institute for Psychiatry,

Munich.

Mann, J.J., Ellis, S.P., Waternaux, C.M., Liu, X., Oquendo, M.A., Malone, K.M., Brodsky,

B.S., Haas, G.L., Currier, D., 2008. Classification Trees Distinguish Suicide

Attempters in Major Psychiatric Disorders: A Model of Clinical Decision Making. J.

Clin. Psychiatry 69, 23–31.

Mazza, J.J., Catalano, R.F., Abbott, R.D., Haggerty, K.P., 2011. An Examination of the

Validity of Retrospective Measures of Suicide Attempts in Youth. J. Adolesc. Health

49, 532–537. https://doi.org/10.1016/j.jadohealth.2011.04.009

Miché, M., Hofer, P.D., Voss, C., Meyer, A.H., Gloster, A.T., Beesdo-Baum, K., Lieb, R.,
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 34

2018. Mental disorders and the risk for the subsequent first suicide attempt: results of

a community study on adolescents and young adults. Eur. Child Adolesc. Psychiatry

27, 839–848. https://doi.org/10.1007/s00787-017-1060-5

Millner, A.J., Lee, M.D., Nock, M.K., 2015. Single-Item Measurement of Suicidal Behaviors:

Validity and Consequences of Misclassification. PLOS ONE 17.

Mushkudiani, N.A., Hukkelhoven, C.W.P.M., Hernández, A.V., Murray, G.D., Choi, S.C.,

Maas, A.I.R., Steyerberg, E.W., 2008. A systematic review finds methodological

improvements necessary for prognostic models in determining traumatic brain injury

outcomes. J. Clin. Epidemiol. 61, 331–343.

https://doi.org/10.1016/j.jclinepi.2007.06.011

Nock, M.K., Borges, G., Bromet, E.J., Cha, C.B., Kessler, R.C., Lee, S., 2008. Suicide and

suicidal behavior. Epidemiol. Rev. 30, 133–154.

https://doi.org/10.1093/epirev/mxn002

Nock, M.K., Millner, A.J., Joiner, T.E., Gutierrez, P.M., Han, G., Hwang, I., King, A.,

Naifeh, J.A., Sampson, N.A., Zaslavsky, A.M., Stein, M.B., Ursano, R.J., Kessler,

R.C., 2018. Risk factors for the transition from suicide ideation to suicide attempt:

Results from the Army Study to Assess Risk and Resilience in Servicemembers

(Army STARRS). J. Abnorm. Psychol. 127, 139–149.

https://doi.org/10.1037/abn0000317

Nordström, P., Samuelsson, M., Åsberg, M., 1995. Survival analysis of suicide risk after

attempted suicide. Acta Psychiatr. Scand. 91, 336–340. https://doi.org/10.1111/j.1600-

0447.1995.tb09791.x

Passos, I.C., Mwangi, B., Cao, B., Hamilton, J.E., Wu, M.-J., Zhang, X.Y., Zunta-Soares,
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 35

G.B., Quevedo, J., Kauer-Sant’Anna, M., Kapczinski, F., Soares, J.C., 2016.

Identifying a clinical signature of suicidality among patients with mood disorders: A

pilot study using a machine learning approach. J. Affect. Disord. 193, 109–116.

https://doi.org/10.1016/j.jad.2015.12.066

Pavlou, M., Ambler, G., Seaman, S., De Iorio, M., Omar, R.Z., 2016. Review and evaluation

of penalised regression methods for risk prediction in low-dimensional data with few

events. Stat. Med. 35, 1159–1177. https://doi.org/10.1002/sim.6782

Perkonigg, A., Wittchen, H.-U., 1995a. The Daily-Hassles Scale. Research version. Max

Planck Institute for Psychiatry, Munich.

Perkonigg, A., Wittchen, H.-U., 1995b. Scale for investigation of problem-solving

competences. Research version. Max Planck Institute for Psychiatry, Munich.

Probst, P., Boulesteix, A.-L., 2018. To Tune or Not to Tune the Number of Trees in Random

Forest. J. Mach. Learn. Res. 18, 1–18.

R Core Team, 2017. R: a language and environment for statistical computing. R Foundation

for Statistical Computing, Vienna.

Raudys, S.J., Jain, A.K., 1991. Small sample size effects in statistical pattern recognition:

Recommendations for practitioners. IEEE Trans. Pattern Anal. Mach. Intell. 3, 252–

264.

Reed, V., Gander, F., Pfister, H., Steiger, A., Sonntag, H., Trenkwalder, C., Sonntag, A.,

Hundt, W., Wittchen, H.-U., 1998. To what degree does the Composite International

Diagnostic Interview (CIDI) correctly identify DSM-IV disorders? Testing validity

issues in a clinical sample. Int. J. Methods Psychiatr. Res. 7, 142–155.


Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 36

https://doi.org/10.1002/mpr.44

Reznick, J.S., Hegeman, I.M., Kaufman, E.R., Woods, S.W., Jacobs, M., 1992. Retrospective

and concurrent self-report of behavioral inhibition and their relation to adult mental

health. Dev. Psychopathol. 4, 301–321. https://doi.org/10.1017/S095457940000016X

Ribeiro, J.D., Franklin, J.C., Fox, K.R., Bentley, K.H., Kleiman, E.M., Chang, B.P., Nock,

M.K., 2016. Self-injurious thoughts and behaviors as risk factors for future suicide

ideation, attempts, and death: a meta-analysis of longitudinal studies. Psychol. Med.

46, 225–236. https://doi.org/10.1017/S0033291715001804

Rice, M.E., Harris, G.T., 2005. Comparing effect sizes in follow-up studies: ROC Area,

Cohen’s d, and r. Law Hum. Behav. 29, 615–620. https://doi.org/10.1007/s10979-005-

6832-7

Saito, T., Rehmsmeier, M., 2014. The Precision-Recall plot is more informative than the ROC

plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE 10,

e0118432. https://doi.org/10.6084/m9.figshare.1245061.v1

Simon, G.E., Johnson, E., Lawrence, J.M., Rossom, R.C., Ahmedani, B., Lynch, F.L., Beck,

A., Waitzfelder, B., Ziebell, R., Penfold, R.B., Shortreed, S.M., 2018. Predicting

Suicide Attempts and Suicide Deaths Following Outpatient Visits Using Electronic

Health Records. Am. J. Psychiatry 175, 951–960.

https://doi.org/10.1176/appi.ajp.2018.17101167

Šimundić, A.-M., 2009. Measures of Diagnostic Accuracy: Basic Definitions. J. Int. Fed.

Clin. Chem. Lab. Med. 19, 203–211.

Spauwen, J., Krabbendam, L., Lieb, R., Wittchen, H.-U., van Os, J., 2004. Does urbanicity
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 37

shift the population expression of psychosis? J. Psychiatr. Res. 38, 613–618.

https://doi.org/10.1016/j.jpsychires.2004.04.003

Steyerberg, E.W., 2009. Clinical Prediction Models. A Practical Approach to Development,

Validation, and Updating, In Statistics for Biology and Health Health (series eds. M

Gail, JM Samet, B Singer, A Tsiatis). Springer, New York.

https://doi.org/10.1007/978-0-387-77244-8

Steyerberg, E.W., Vickers, A.J., Cook, N.R., Gerds, T., Gonen, M., Obuchowski, N., Pencina,

M.J., Kattan, M.W., 2010. Assessing the Performance of Prediction Models: A

Framework for Traditional and Novel Measures. Epidemiology 21, 128–138.

Studerus, E., Ramyead, A., Riecher-Rössler, A., 2017. Prediction of transition to psychosis in

patients with a clinical high risk for psychosis: a systematic review of methodology

and reporting. Psychol. Med. 47, 1163–1178.

https://doi.org/10.1017/S0033291716003494

van Loo, H.M., Cai, T., Gruber, M.J., Li, J., de Jonge, P., Petukhova, M., Rose, S., Sampson,

N.A., Schoevers, R.A., Wardenaar, K.J., Wilcox, M.A., Al-Hamzawi, A.O., Andrade,

L.H., Bromet, E.J., Bunting, B., Fayyad, J., Florescu, S.E., Gureje, O., Hu, C., Huang,

Y., Levinson, D., Medina-Mora, M.E., Nakane, Y., Posada-Villa, J., Scott, K.M.,

Xavier, M., Zarkov, Z., Kessler, R.C., 2014. MAJOR DEPRESSIVE DISORDER

SUBTYPES TO PREDICT LONG-TERM COURSE: Research Article: MDD

Subtypes to Predict Long-Term Course. Depress. Anxiety 31, 765–777.

https://doi.org/10.1002/da.22233

Wald, N., Bestwick, J., 2014. Is the area under an ROC curve a valid measure of the

performance of a screening or diagnostic test? J. Med. Screen. 21, 51–56.


Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 38

https://doi.org/10.1177/0969141313517497

Walsh, C.G., Ribeiro, J.D., Franklin, J.C., 2018. Predicting suicide attempts in adolescents

with longitudinal clinical data and machine learning. J. Child Psychol. Psychiatry 59,

1261–1270. https://doi.org/10.1111/jcpp.12916

Walsh, C.G., Ribeiro, J.D., Franklin, J.C., 2017. Predicting Risk of Suicide Attempts Over

Time Through Machine Learning. Clin. Psychol. Sci. 5, 457–469.

https://doi.org/10.1177/2167702617691560

Wardenaar, K.J., van Loo, H.M., Cai, T., Fava, M., Gruber, M.J., Li, J., de Jonge, P.,

Nierenberg, A.A., Petukhova, M.V., Rose, S., Sampson, N.A., Schoevers, R.A.,

Wilcox, M.A., Alonso, J., Bromet, E.J., Bunting, B., Florescu, S.E., Fukao, A., Gureje,

O., Hu, C., Huang, Y.Q., Karam, A.N., Levinson, D., Medina Mora, M.E., Posada-

Villa, J., Scott, K.M., Taib, N.I., Viana, M.C., Xavier, M., Zarkov, Z., Kessler, R.C.,

2014. The effects of co-morbidity in defining major depression subtypes associated

with long-term course and severity. Psychol. Med. 44, 3289–3302.

https://doi.org/10.1017/S0033291714000993

Wittchen, H.U., Lachner, G., Wunderlich, U., Pfister, H., 1998a. Test-retest reliability of the

computerized DSM-IV version of the Munich Composite International Diagnostic

Interview (M-CIDI). Soc. Psychiatry Psychiatr. Epidemiol. 33, 568–578.

https://doi.org/DOI 10.1007/s001270050095

Wittchen, H.U., Perkonigg, A., Lachner, G., Nelson, C.B., 1998b. Early Developmental

Stages of Psychopathology Study (EDSP): Objectives and design. Eur. Addict. Res. 4,

18–27. https://doi.org/Doi 10.1159/000018921

Wittchen, H.-U., Pfister, H., 1997. DIA-X-Interviews: Manual für Screening-Verfahren und
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 39

Interview; Interviewheft Längsschnittuntersuchung (DIA-X-Lifetime);

Ergänzungsheft (DIA-XLifetime); Interviewheft Querschnittsuntersuchung (DIA-X-

12 Monate); Ergänzungsheft (DIA-X-12 Monate); PC-Programm zur Durchführung

des Interviews (Längs- und Querschnittuntersuchung); Auswertungsprogramm. Swets

& Zeitlinger, Frankfurt.

World Health Organization, 2014. Preventing suicide: A global imperative [WWW

Document]. URL

https://apps.who.int/iris/bitstream/handle/10665/131056/9789241564779_eng.pdf;jses

sionid=ABD7A55A03CF6B869C727FE5E25CEE77?sequence=1 (accessed 3.1.19).

Wright, M.N., Ziegler, A., 2017. ranger: A Fast Implementation of Random Forests for High

Dimensional Data in C++ and R. J. Stat. Softw. 77, 1–17.

https://doi.org/10.18637/jss.v077.i01

Yarkoni, T., Westfall, J., 2017. Choosing Prediction Over Explanation in Psychology:

Lessons From Machine Learning. Perspect. Psychol. Sci. 12, 1100–1122.

https://doi.org/10.1177/1745691617693393
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 40

Table 1
Overview of the performance estimates for each prediction model.
Model AUC M AUC Md BS M BS Md Sens PPV
Logistic 0.828 0.825 0.179 0.190 0.223 0.704
Lasso 0.826 0.822 0.245 0.246 0.212 0.716
Ridge 0.829 0.830 0.466 0.461 0.251 0.658
Random forest 0.824 0.826 0.136 0.167 0.028 0.870
Note. AUC, area under the receiver operating curve; BS, Brier-scaled; Sens, sensitivity; PPV,
positive predictive value.

Figure 1. Boxplot of 100 resampling results for each prediction model (see median results in
Table 1). Logistic, Logistic regression model; Rf, Random forest model. Left: Area under the
curve (AUC), including the AUC of 0.58 as reported in the meta-analysis by Franklin et al.
(2017). Right: Scaled Brier score, with values below zero indicating a model
performance/calibration inferior to that of a chance prediction model applied to the validation
dataset.
Running head: PREDICT SA – REGRESSION METHODS AND MACHINE LEARNING 41

Table 2
Overview of the decreasing importance of the 16 baseline predictors for each prediction model.
Logistic Lasso Ridge Random forest
Predictor β OR Rank % β OR Rank* % β OR Rank* % Importance Rank*
Prior SA (j) 0.454 1.57 1 57.5 0.439 1.55 1 55.2 0.130 1.14 1 13.9 4.434 1
Education (c) -0.405 0.67 2 33.3 -0.361 0.70 2 30.3 -0.031 0.97 5 3.1 0.235 10
Prior help-seeking (h) 0.296 1.34 3 34.5 0.276 1.32 3 31.7 0.048 1.05 2 4.9 0.664 3
Any parental mental dx (i) 0.278 1.32 4 32.1 0.226 1.25 4 25.4 0.018 1.02 10 1.8 0.117 14
Parental loss or separation (g) 0.245 1.28 5 27.8 0.216 1.24 5 24.1 0.030 1.03 6 3.1 0.199 11
BI (k) 0.221 1.25 6 24.8 0.193 1.21 6 21.2 0.029 1.03 7 3.0 0.428 5
Number of mental dx (d) 0.187 1.21 7 20.6 0.164 1.18 7 17.8 0.043 1.04 3 4.3 1.020 2
DH (n) -0.171 0.84 8 15.7 -0.107 0.90 9 10.2 -0.007 0.99 16 0.7 0.323 7
Number of traumatic events (e) 0.157 1.17 9 17.0 0.109 1.12 8 11.6 0.016 1.02 11 1.6 -0.086 15
Age (b) -0.142 0.87 10 13.2 -0.097 0.91 11 9.3 -0.011 0.99 15 1.1 0.173 13
Rural (o) -0.142 0.87 11 13.2 -0.097 0.91 10 9.3 -0.014 0.99 12 1.4 0.182 12
Sex (a) 0.106 1.11 12 11.2 0.060 1.06 13 6.2 0.011 1.01 14 1.1 0.045 16
PCE (p) -0.099 0.91 13 9.4 -0.081 0.92 12 7.8 -0.019 0.98 9 1.9 0.300 8
NLE (m) -0.020 0.98 14 2.0 0.000 1.00 15 0.0 0.013 1.01 13 1.3 0.480 4
Rape/Childhood sexual abuse (f) 0.011 1.01 15 1.1 0.013 1.01 14 1.3 0.034 1.03 4 3.4 0.240 9
PE (l) 0.001 1.00 16 0.1 0.000 1.00 15 0.0 0.023 1.02 8 2.4 0.343 6
Note. Letter in brackets after each predictor corresponds to ordering of predictors in section ―Selection and assessment of predictors‖; Rank*, Order
according to the predictor ranking of the logistic regression model; β, beta-coefficient of the (penalized) logistic regression model; OR, odds ratio;
%, OR translated to percentage. The original importance values of the random forest model have been multiplied by 1000, to avoid having to
display too many digits. Prior SA, Lifetime suicide attempt, reported at baseline; Education 1 = low, 2 = middle, 3 = high; dx disorder; BI,
behavioral inhibition; Number of mental dx, Number of DSM-IV diagnoses; DH daily hassles; Rural, 0 = living in an urban area; PCE perceived
coping efficacy (higher PCE values denote lower PCE); NLE, negative life events; PE, psychotic experiences.

You might also like