Professional Documents
Culture Documents
A Bayesian learning model to predict the risk for cannabis use disorder
Rajapaksha Mudalige Dhanushka S. Rajapaksha a, Francesca Filbey b, Swati Biswas a, *,
Pankaj Choudhary a, *
a
Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX, USA
b
School of Behavioral and Brain Sciences, University of Texas at Dallas, Richardson, TX, USA
A R T I C L E I N F O A B S T R A C T
Keywords: Background: The prevalence of cannabis use disorder (CUD) has been increasing recently and is expected to
Cannabis use disorder increase further due to the rising trend of cannabis legalization. To help stem this public health concern, a model
Prediction model is needed that predicts for an adolescent or young adult cannabis user their personalized risk of developing CUD
Bayesian methods
in adulthood. However, there exists no such model that is built using nationally representative longitudinal data.
Machine learning
Methods: We use a novel Bayesian learning approach and data from Add Health (n = 8712), a nationally
Model validation
representative longitudinal study, to build logistic regression models using four different regularization priors:
lasso, ridge, horseshoe, and t. The models are compared by their prediction performance on unseen data via 5-
fold-cross-validation (CV). We assess model discrimination using the area under the curve (AUC) and calibration
by comparing the expected (E) and observed (O) number of CUD cases. We also externally validate the final
model on independent test data from Add Health (n = 570).
Results: Our final model is based on lasso prior and has seven predictors: biological sex; scores on personality
traits of neuroticism, openness, and conscientiousness; and measures of adverse childhood experiences, de
linquency, and peer cannabis use. It has good discrimination and calibration performance as reflected by its
respective AUC and E/O of 0.69 and 0.95 based on 5-fold CV and 0.71 and 1.10 on validation data.
Conclusion: This externally validated model may help in identifying adolescent or young adult cannabis users at
high risk of developing CUD in adulthood.
https://doi.org/10.1016/j.drugalcdep.2022.109476
Received 28 December 2021; Received in revised form 19 April 2022; Accepted 23 April 2022
Available online 29 April 2022
0376-8716/© 2022 Elsevier B.V. All rights reserved.
R.M.D.S. Rajapaksha et al. Drug and Alcohol Dependence 236 (2022) 109476
et al., 2020; Meier et al., 2016; Rajapaksha et al., 2020; Tomko et al., nationally representative longitudinal data from National Longitudinal
2019; Verdejo-García et al., 2008; Zhang-James et al., 2020). Study on Adolescent Health (Add Health) (Harris et al., 2009). A
With a vast literature available on risk factors for SUD, a natural and comprehensive set of potential predictors, including demographic,
practically important next step is to build risk prediction models for behavioral, personality, and cognitive characteristics of individuals, was
SUD. Indeed, such models have been developed for several diseases and considered. The final model was independently validated on an external
disorders, including depression, ADHD, and mental illness (Bernardini test data, also obtained from Add Health. This study has been approved
et al., 2017; Cattelani et al., 2019; Caye et al., 2019; Chowdhury et al., by the Institutional Review Board of the University of Texas at Dallas.
2018; D’Agostino et al., 2008; Gail et al., 1989). Although some studies
have developed simple cumulative risk indices and/or risk scores for 2. Methods
SUD outcomes (Hayatbakhsh et al., 2009; Meier et al., 2016), efforts to
build risk prediction models to predict an SUD outcome are a recent 2.1. Participants
development. Rajapaksha et al. (2020) proposed a preliminary model
for predicting CUD by applying statistical and machine learning (ML) Add Health used a multistage stratified cluster sampling design to
techniques. Another recent study applied ML techniques to a longitu ensure that the sample reflected the adolescent population of the United
dinal dataset and predicted SUD (Jing et al., 2020). A few other studies States in terms of urbanicity, region, school size, school type (public/
also used ML techniques to predict substance use outcomes (Hu et al., private), and ethnicity (Harris et al., 2009). Adolescents were first
2020; Nasir et al., 2021; Zhang-James et al., 2020; Zoboroski et al., enrolled when they were in grades 7–12 during the 1994–95 school year
2021). (wave I). They were followed up in 1996 (wave II), 2001–2002 (wave
However, all these studies have a common and major limitation of III), and 2008 (wave IV). The most recent wave (wave V) was in
being based on data from a limited geographic location or high-risk 2016–18, however, information regarding SUDs was not collected in this
population (Hayatbakhsh et al., 2009; Hu et al., 2020; Jing et al., wave. So, we used data up to wave IV during which the participants were
2020; Nasir et al., 2021; Rajapaksha et al., 2020; Zhang-James et al., adults aged 24–32.
2020). Therefore, these models are not generalizable to a larger or
nationwide population of substance users and thus may not be suitable 2.2. Data preparation
for risk assessment in clinical practice. Moreover, some studies have
focused only on certain specific types of predictors rather than consid Our response variable is a binary measure of lifetime diagnosis of
ering a comprehensive set of risk factors and hence are not suitable for CUD, which was measured only in wave IV. For model building, we
risk prediction as such. Another crucial aspect of any risk prediction included only those participants who started using cannabis during any
model is the use of longitudinal data so that one can ensure that the risk of the first three waves, participated in wave IV, and have survey
factors have been measured before the development of the outcome weights available. Diagnosis of CUD was derived from Add Health items
rather their measurements being effects of the outcome. Although some that were originally based on DSM-IV for lifetime diagnosis of cannabis
studies used longitudinal data, they did not consider the effect of the use dependence. Specifically, each item had dichotomous (Yes/No) re
longitudinal trajectory of risk factors and used their cross-sectional sponses indicating whether or not one engaged in a certain substance
summaries (Hayatbakhsh et al., 2009; Hu et al., 2020; Jing et al., dependence behavior corresponding to each of the diagnostic criteria
2020; Meier et al., 2016). Finally, and perhaps most importantly, none outlined in DSM-IV for cannabis dependence, such as tolerance, with
of these tools has been independently validated on external data. drawal, etc. An answer of “yes” to three or more questions within a 12-
Given the large number of potential risk factors for SUD, for practical month period was indicative of CUD. This is a widely used criterion to
utility it is important to ensure that a risk prediction model is parsi measure CUD (Feingold et al., 2020). Potential participants were
monious. This may be achieved using regularization methods as they excluded if they developed CUD but did not provide an age of CUD onset
shrink (penalize) regression coefficients towards zero, thereby avoiding or had inconsistency between their reported status of ever use cannabis
overfitting and improving prediction accuracy on future unseen data and age of first cannabis use across different waves. There were 9491
(Park and Casella, 2008; Tibshirani, 1996, 2011). Regularization is participants after applying these inclusion and exclusion criteria.
accomplished by adding a penalty term to the negative log-likelihood Add Health administered several age-specific questionnaires con
function that we need to minimize to fit a model. For example, the taining more than 1000 items. We constructed around 50 potential
penalty term under lasso (ridge) regularization is the sum of absolute predictors from these questionnaires based on literature review. Some
values (squares) of the slope coefficients multiplied by a non-negative predictors were cross-sectional, measured in a single wave, while the
penalty parameter. This parameter is chosen optimally through others were longitudinal, measured across multiple waves. We consid
cross-validation. ered two ways of summarizing longitudinal predictors for inclusion in
Such regularization can be achieved through Bayesian learning the CUD model: (1) using random effects obtained by fitting a separate
methods in a more flexible way. In classical framework, the penalty linear mixed effects model (LMM) for each predictor relating the pre
parameter has to be estimated separately from the regression co dictor to age of participants at different waves (see Supplementary
efficients. While in Bayesian framework, the penalty parameter is part of Materials), and (2) taking the average/maximum exposure over
the whole model, which contains not only the regression coefficients but different waves (Chen et al., 2015; Dandis et al., 2020).
other parameters as well (e.g., variance parameters). All parameters
have priors (e.g., in Bayesian lasso, regression coefficients have Laplace 2.3. Initial variable processing
priors) and thus regularization takes place within the model building
process in an integrated manner (Van Erp et al., 2019). Furthermore, While most of the predictors came from waves I to III, the following
Bayesian regularization methods naturally quantify the uncertainty in two types came from wave IV. One retrospectively measured adverse
estimates through posterior distributions, which is not so straightfor childhood experiences (ACEs), which occurred before age 18 and were
ward for their classical counterparts (Carvalho et al., 2010; O’Hara and used to create the ACE scale. The other measured personality traits that
Sillanpaa, 2009; Park and Casella, 2008; Van Erp et al., 2019). Despite are believed to remain relatively stable over time (Caspi et al., 2005;
these advantages, there is no study in the SUD literature that has used Damian et al., 2019). Further, although CUD was measured in wave IV, a
Bayesian learning methods for risk prediction. person could have developed CUD for the first time before wave IV.
In this study, we built Bayesian risk prediction models to predict the Hence, for each CUD case, we only considered the portion of their data
risk of developing CUD in adulthood based on risk factors (which may that were recorded before their age of CUD onset. Thus, it was effec
vary over time) measured in adolescence and young adulthood. We used tively ensured that all predictors were measured before the outcome.
2
R.M.D.S. Rajapaksha et al. Drug and Alcohol Dependence 236 (2022) 109476
Some predictors had substantial missing data. Therefore, an initial predictors are selected. The second method excludes a variable if the
variable filtering process was implemented to identify potentially posterior probability that its regression coefficient is within ± 1 poste
important predictors while maximizing the number of participants with rior standard deviation (SD) of 0 exceeds a certain threshold. The
complete data. As complex survey design must be accounted for even in threshold is varied such as 0.3, 0.4, 0.5, etc. A higher threshold leads to
this initial variable filtering, we fitted survey multiple logistic regression fewer variables meeting the exclusion requirement and hence to fewer
models using varying sample sizes (corresponding to different pro excluded variables. By applying these variable selection methods with
portions of missing values) and considered a variable to be potentially different levels/thresholds and different regularization priors, we ob
important if its p-value was less than 0.3 in at least one of the models. tained a total of 40 competing models. Each model provides a predicted
This process identified 15 predictors (see Table 1) and 8712 participants probability of CUD for an individual (see Supplementary Materials for its
with complete data on them. All quantitative predictors were scaled/ calculation).
transformed to lie within [0,1] by diving by their maximum possible
value (all except TV hours) or their maximum value observed in the data 2.4.2. Using prediction accuracy to obtain the best model
(TV hours). This helps in ensuring portability of the model to data To identify the best among the competing models, we compared their
wherein predictors are measured in different scales. This final data set is prediction accuracy on unseen data via 5-fold cross-validation (CV)
used as the training data. (James et al., 2013) and Bayesian leave one out (LOO) cross-validation
(Vehtari et al., 2017). These approaches provide a good assessment of
model performance on unseen data by protecting against overfitting.
2.4. Statistical methods The model discrimination is assessed by area under the receiver oper
ating characteristic curve (AUC) based on 5-fold CV and the Bayesian
2.4.1. Variable selection and Bayesian learning models LOO estimate. Higher AUC and LOO indicate better prediction accuracy.
We used a Bayesian learning approach within the framework of lo To evaluate model calibration, we compared the expected number of
gistic regression models with random effects. This involves specifying cases (E) based on 5-fold CV with the observed (O) number of cases. The
priors for the unknown model parameters. The priors on regression closer the E/O value is to 1, the better is the model performance.
coefficients serve to regularize them (Gelman et al., 2014). We used four
different regularization priors: lasso, ridge, horseshoe, and t (Van Erp 2.4.3. External validation
et al., 2019). All models incorporated the complex survey sampling To externally validate the final proposed model, we utilized an in
design by weighting each participant’s data with their survey weight in dependent test dataset. It consisted of Add Health participants whose
the likelihood and including the stratification variable region as a fixed survey weights were missing and hence were not included in the training
covariate and the clustering variable school as a random effect (see data. We calculated their risk of developing CUD using the final model
Supplementary Materials for details). and compared with their actual CUD status. Then we calculated AUC
As priors for regression coefficients are continuous, they do not and E/O.
automatically provide variable selection. Therefore, we used two More information about statistical methods is available in Supple
methods for variable selection after fitting the full models with 15 pre mentary Materials. The models were fitted using the statistical software
dictors: credible interval and probability thresholding (Li and Lin, 2010; system R (R Core Team, 2019) with the following packages: RStan (Stan
Van Erp et al., 2019). Briefly, in the first method, a predictor is selected if Development Team, 2020) for Bayesian inference, lme4 (Bates et al.,
a credible interval for its regression coefficient excludes 0. The level of 2015) to summarize longitudinal predictors using random effects, sur
the interval is varied, e.g., 70%, 80%, 95%, etc. As the level increases, vey (Lumley, 2004) for initial variable filtering, loo for cross-validation
the interval becomes wider, it includes 0 more often, and hence fewer (Vehtari et al., 2020), and pROC (Robin et al., 2011) for AUC.
Table 1 3. Results
Summary of predictors that passed initial variable filtering.
Predictor Description 3.1. Sample characteristics
Biological sex 0 =Female, 1 =Male
Race 1 =White, 2 = Black/African American, 3 = American
In our final dataset (n = 8712), the unweighted and weighted
Indian/Native American, 4 =Asian, 5 =other prevalence of lifetime CUD are 7.51% and 7.84%, respectively. As seen
ACE scale Number of adverse childhood experiences that occurred in Table 2, compared to controls (i.e., the non-CUD participants), the
before age 18. cases are more likely to be males (61% vs 50%); and on average expe
Neuroticism scale Measure of general tendency to experience negative
rienced more ACEs (0.29 vs 0.25) and reported higher neuroticism (0.56
feelings. A higher value implies a greater tendency.
Conscientiousness scale Measure of forward planning, organization, and ability to vs 0.52) and openness (0.76 vs 0.73) but lower conscientiousness (0.70
carry out tasks. A higher value implies greater vs 0.73). Table 3 presents sample characteristics for longitudinal pre
conscientiousness. dictors across the waves in which they were measured. Generally,
Agreeableness scale Measure of compassion, eagerness to cooperate, and
depressive symptoms, peer alcohol use, peer cannabis use, and peer
tendency to avoid conflict. A higher value implies more
agreeableness.
smoking increased over time while delinquent activities and violence
Openness scale Measure of openness to new experiences and victimization decreased. For all the predictors, the average score of a
imaginativeness. A higher value implies more openness. predictor was at least as high for cases as for controls.
Anxiety scale Number of times experienced anxiety symptoms.
Depression scale Number of times experienced depression symptoms.
Delinquency scale Number of times involved in delinquent activities. A
3.2. Results from Bayesian learning models
higher value implies a greater involvement.
Violence victimization Number of times experienced violent incidents. As described in the Methods section, we applied four shrinkage
scale priors and performed variable selection among 15 predictors that passed
Peer alcohol use Number of best friends (out of 3 best friends) who drink
the initial filtering using credible interval and thresholding methods.
alcohol at least once a month.
Peer cannabis use Number of best friends (out of 3 best friends) who use Interestingly, the model with highest prediction accuracy for each prior
cannabis at least once a month. contained the same seven predictors: biological sex, ACE, conscien
Peer smoking Number of best friends (out of 3 best friends) who smoke tiousness, neuroticism, openness, delinquency, and peer cannabis use.
at least 1 cigarette a day. Next, we fitted models with the same priors but with data on only these
TV hours Number of hours per week watched television.
seven predictors, resulting in a slightly larger sample size of 8753 due to
3
R.M.D.S. Rajapaksha et al. Drug and Alcohol Dependence 236 (2022) 109476
Table 3
Summary of longitudinal predictors: Mean (standard deviation) across different waves. The p-values are based on a logistic regression model with SUD status as
response and random effects associated with the longitudinal predictor as covariates.
Wave 1 Wave 2 Wave 3
Variable Overall Cases Controls Overall Cases Controls Overall Cases Controls p-value
Anxiety scale 0.15 (0.10) 0.16 (0.11) 0.15 (0.10) 0.15 (0.10) 0.16 (0.10) 0.15 (0.100 0.006
Depression scale 0.27 (0.09) 0.29 (0.10) 0.27 (0.09) 0.28 (0.09) 0.28 (0.09) 0.28 (0.09) 0.30 (0.11) 0.33 (0.11) 0.30 (0.11) 0.008
Delinquency scale 0.12 (0.13) 0.16 (0.14) 0.12 (0.13) 0.09 (0.11) 0.12 (0.13) 0.08 (0.11) 0.03 (0.06) 0.05 (0.07) 0.03 (0.06) < 0.001
Violence victimization scale 0.05 (0.07) 0.06 (0.08) 0.05 (0.07) 0.02 (0.06) 0.03 (0.07) 0.02 (0.06) 0.02 (0.06) 0.02 (0.06) 0.02 (0.05) 0.002
Peer alcohol use 0.45 (0.40) 0.48 (0.41) 0.45 (0.40) 0.47 (0.40) 0.47 (0.40) 0.46 (0.40) 0.66 (0.39) 0.72 (0.36) 0.65 (0.39) 0.01
Peer cannabis use 0.28 (0.36) 0.34 (0.39) 0.28 (0.36) 0.33 (0.37) 0.4 (0.40) 0.32 (0.37) <0.001
Peer smoking 0.34 (0.38) 0.36 (0.39) 0.33 (0.37) 0.37 (0.39) 0.42 (0.40) 0.37 (0.39) <0.001
4
R.M.D.S. Rajapaksha et al. Drug and Alcohol Dependence 236 (2022) 109476
Table 5
Final Bayesian learning model using lasso prior: Posterior means of regression coefficients, posterior means of odds ratios (OR), 95% credible intervals for OR, posterior
means of odds ratio (OR*) on original scale (only for continuous cross-sectional variables), and 95% credible intervals for OR* .
Variable Posterior mean of coefficient Posterior mean of OR Credible Interval for OR Posterior mean for OR* Credible interval for OR*
Intercept -4.68
Biological sex 0.33 1.40 (1.17,1.66)
ACE scale 0.51 1.71 (1.08,2.56) 1.06 (1.01,1.11)
Conscientiousness scale -1.14 0.34 (0.17,0.58) 0.94 (0.92,0.97)
Neuroticism scale 1.72 5.86 (3.01,10.49) 1.09 (1.06,1.12)
Openness scale 1.68 5.71 (2.66,10.89) 1.09 (1.05,1.13)
Delinquency scale 3.84 50.07 (22.43,98.38)
Peer cannabis use 0.67 2.01 (1.24,3.06)
5
R.M.D.S. Rajapaksha et al. Drug and Alcohol Dependence 236 (2022) 109476
Spiegelman et al., 1994). Although future validation studies are Cattelani, L., Murri, M.B., Chesani, F., Chiari, L., Bandinelli, S., Palumbo, P., 2019. Risk
prediction model for late life depression: development and validation on three large
important for building more confidence, the model is ready to be
European datasets. IEEE J. Biomed. Health Inf. 23 (5), 2196–2204.
considered for potential adoption in practice. To aid in this task, an R Caye, A., Agnew-Blais, J., Arseneault, L., Gonçalves, H., Kieling, C., Langley, K.,
package is under construction (relevant steps are provided in Supple Menezes, A.M.B., Moffitt, T.E., Passos, I.C., Rocha, T.B., Sibley, M.H., Swanson, J.M.,
mentary Materials). The model can help in identifying adolescent or Thapar, A., Wehrmeister, F., Rohde, L.A., 2019. A risk calculator to predict adult
attention-deficit/hyperactivity disorder: Generation and external validation in three
young adult cannabis users who are at high risk of developing CUD in birth cohorts and one clinical sample. Epidemiol. Psychiatr. Sci. 29, e37.
adulthood. Such users may then be provided with appropriate inter Tomko, R., Williamson, N.A., McRae-Clark, A., Gray, K.M., 2019. Cannabis use disorder
vention and prevention measures to help them divert from the path as a developmental disorder. In: Montoya, I., Weiss, D., R. B., S. (Eds.), Cannabis Use
Disorders. Springer, New York, pp. 189–199.
towards CUD. The model may be also helpful in medical settings where WHO, 2020. Management of Substance Abuse: Cannabis. World Health Organization
patients are considering using medical cannabis in consultation with (Accessed 23 Sep, 2020). 〈https://www.who.int/substance_abuse/facts/cannabis/
clinicians. en/〉.
CDC, 2020. High-Risk Substance Use Among Youth. 〈https://www.cdc.
gov/healthyyouth/substance-use/index.htm#4〉. (Accessed 05 Aug 2021).
Role of funding sources Chen, Y.H., Ferguson, K.K., Meeker, J.D., McElrath, T.F., Mukherjee, B., 2015. Statistical
methods for modeling repeated measures of maternal environmental exposure
biomarkers during pregnancy in association with preterm birth. Environ. Health 14,
This work was funded by the University of Texas at Dallas SPIRe seed 9.
grant. The sponsor had no role in the study design, collection, analysis or Chowdhury, M., Euhus, D., Arun, B., Umbricht, C., Biswas, S., Choudhary, P., 2018.
interpretation of data, writing the manuscript and the decision to submit Validation of a personalized risk prediction model for contralateral breast cancer.
Breast Cancer Res Treat. 170 (2), 415–423.
this manuscript for publication.
Costantino, J.P., Gail, M.H., Pee, D., Anderson, S., Redmond, C.K., Benichou, J.,
Wieand, H.S., 1999. Validation studies for models projecting the risk of invasive and
CRediT authorship contribution statement total breast cancer incidence. J. Natl. Cancer Inst. 91 (18), 1541–1548.
D’Agostino, R.B., Vasan, R.S., Pencina, M.J., Wolf, P.A., Cobain, M., Massaro, J.M.,
Kannel, W.B., 2008. General cardiovascular risk profile for use in primary care: The
SB, PKC, and RMDR conceived the study. RMDR carried out all data Framingham Heart Study. Circulation 117 (6), 743–753.
pre-processing and analyses. SB and PKC supervised RMDR throughout Damian, R.I., Spengler, M., Sutu, A., Roberts, B.W., 2019. Sixteen going on sixty-six: a
the entire project. FF provided subject matter expertise in designing the longitudinal study of personality stability and change across 50 years. J. Pers. Soc.
Psychol. 117 (3), 674–695.
study and interpreting the results. All authors participated in inter Dandis, R., Teerenstra, S., Massuger, L., Sweep, F., Eysbouts, Y., IntHout, J., 2020.
preting the results and writing manuscript. All authors have read and A tutorial on dynamic risk prediction of a binary outcome based on a longitudinal
approved the final version of the manuscript. biomarker. Biom. J. 62 (2), 398–413.
Douglas, K.R., Chan, G., Gelernter, J., Arias, A.J., Anton, R.F., Weiss, R.D., Brady, K.,
Poling, J., Farrer, L., Kranzler, H.R., 2010. Adverse childhood events as risk factors
Acknowledgement for substance dependence: partial mediation by mood and anxiety disorders. Addict.
Behav. 35 (1), 7–13.
Feingold, D., Livne, O., Rehm, J., Lev-Ran, S., 2020. Probability and correlates of
The data used in this work are from Add Health, a program project transition from cannabis use to DSM-5 cannabis use disorder: results from a large-
directed by Kathleen Mullan Harris and designed by J. Richard Udry, scale nationally representative study. Drug Alcohol Rev. 39 (2), 142–151.
Peter S. Bearman, and Kathleen Mullan Harris at the University of North Gail, M.H., Brinton, L.A., Byar, D.P., Corle, D.K., Green, S.B., Schairer, C., Mulvihill, J.J.,
1989. Projecting individualized probabilities of developing breast cancer for white
Carolina at Chapel Hill and funded by grant P01-HD31921 from the
females who are being examined annually. J. Natl. Cancer Inst. 81 (24), 1879–1886.
Eunice Kennedy Shriver National Institute of Child Health and Human Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B., 2014.
Development, with cooperative funding from 23 other federal agencies Bayesian Data Analysis, third ed. CRC Press, Boca Raton.
Gray, K.M., Squeglia, L.M., 2018. Research review: What have we learned about
and foundations. Information on how to obtain the Add Health data files
adolescent substance use? J. Child Psychol. Psychiatry 59 (6), 618–627.
is available on the Add Health website (http://www.cpc.unc.edu/addh Harris, K.M., Halpern, C.T., Whitsel, E., Hussey, J., Tabor, J., Entzel, P., Udry, J.R., 2009.
ealth). No direct support was received from grant P01-HD31921 for The National Longitudinal Study of Adolescent to Adult Health: Research Design.
this analysis. The authors thank Thanthirige Lakshika Ruberu for help 〈https://addhealth.cpc.unc.edu/documentation/study-design/〉. (Accessed 05 Aug,
2021).
ing with initial exploration of the data. Hasin, D.S., Sarvet, A.L., Cerdá, M., Keyes, K.M., Stohl, M., Galea, S., Wall, M.M., 2017.
US adult illicit cannabis use, cannabis use disorder, and medical marijuana laws:
Declaration of competing interest 1991-1992 to 2012-2013. JAMA Psychiatry 74 (6), 579–588.
Hayatbakhsh, M.R., Najman, J.M., Bor, W., O’Callaghan, M.J., Williams, G.M., 2009.
Multiple risk factor model predicting cannabis use and use disorders: a longitudinal
None. study. Am. J. Drug Alcohol Abus. 35 (6), 399–407.
Heilig, M., MacKillop, J., Martinez, D., Rehm, J., Leggio, L., Vanderschuren, L.J.M.J.,
2021. Addiction as a brain disease revised: why it still matters, and the need for
Appendix A. Supporting information consilience. Neuropsychopharmacology 46 (10), 1715–1723.
Hu, Z., Jing, Y., Xue, Y., Fan, P., Wang, L., Vanyukov, M., Kirisci, L., Wang, J., Tarter, R.
Supplementary data associated with this article can be found in the E., Xie, X.Q., 2020. Analysis of substance use and its outcomes by machine learning:
II. Derivation and prediction of the trajectory of substance use severity. Drug Alcohol
online version at doi:10.1016/j.drugalcdep.2022.109476. Depend. 206, 107604.
James, G., Witten, D., Hastie, T., Tibshirani, R., 2013. An Introduction to Statistical
References Learning: with Applications in R. Springer, New York.
Jing, Y., Hu, Z., Fan, P., Xue, Y., Wang, L., Tarter, R.E., Kirisci, L., Wang, J.,
Vanyukov, M., Xie, X.Q., 2020. Analysis of substance use and its outcomes by
Afuseh, E., Pike, C.A., Oruche, U.M., 2020. Individualized approach to primary
machine learning I. Childhood evaluation of liability to substance use disorder. Drug
prevention of substance use disorder: age-related risks. Subst. Abus. Treat. Prev.
Alcohol Depend. 206, 107605.
Policy 15 (1), 58.
Ketcherside, A., Jeon-Slaughter, H., Baine, J.L., Filbey, F.M., 2016. Discriminability of
Bates, D., Maechler, M., Bolker, B., Walker, S., 2015. Fitting linear mixed-effects models
personality profiles in isolated and co-morbid marijuana and nicotine users.
using lme4. J. Stat. Softw. 67 (1), 1–48.
Psychiatry Res 238, 356–362.
Beaton, D., Abdi, H., Filbey, F.M., 2014. Unique aspects of impulsive traits in substance
Knapp, A.A., Lee, D.C., Borodovsky, J.T., Auty, S.G., Gabrielli, J., Budney, A.J., 2019.
use and overeating: specific contributions of common assessments of impulsivity.
Emerging trends in cannabis administration among adolescent cannabis users.
Am. J. Drug Alcohol Abus. 40 (6), 463–475.
J. Adolesc. Health 64 (4), 487–493.
Bernardini, F., Attademo, L., Cleary, S.D., Luther, C., Shim, R.S., Quartesan, R.,
Koh, P.K., Peh, C.X., Cheok, C., Guo, S., 2017. Violence, delinquent behaviors, and drug
Compton, M.T., 2017. Risk prediction models in psychiatry: toward a new frontier
use disorders among adolescents from an addiction-treatment sample. J. Child
for the prevention of mental illnesses. J. Clin. Psychiatry 78 (5), 572–583.
Adolesc. Subst. Abus. 26 (6), 463–471.
Bridgeman, M.B., Abazia, D.T., 2017. Medicinal cannabis: History, pharmacology, and
Li, Q., Lin, N., 2010. The Bayesian elastic net. Bayesian Anal. 5 (1), 151–170.
implications for the acute care setting. Pharm. Ther. 42 (3), 180–188.
Lowe, C.C., Miller, B.L., Stogner, J., 2020. Comfortably numb? Revisiting and re-
Carvalho, C.M., Polson, N.G., Scott, J.G., 2010. The horseshoe estimator for sparse
specifying the relationship between health strain and substance use. Crime. Delinq.
signals. Biometrika 97 (2), 465–480.
66 (13–14), 1937–1959.
Caspi, A., Roberts, B.W., Shiner, R.L., 2005. Personality development: stability and
Lumley, T., 2004. Analysis of complex survey samples. J. Stat. Softw. 9 (8), 1–19.
change. Annu Rev. Psychol. 56, 453–484.
6
R.M.D.S. Rajapaksha et al. Drug and Alcohol Dependence 236 (2022) 109476
Marel, C., Sunderland, M., Mills, K.L., Slade, T., Teesson, M., Chapman, C., 2019. SAMHSA, 2016. Facing Addiction in America: The Surgeon General’s Report on Alcohol,
Conditional probabilities of substance use disorders and associated risk factors: Drugs, and Health. 〈https://www.hhs.gov/surgeongeneral/reports-and-publication
progression from first use to use disorder on alcohol, cannabis, stimulants, sedatives s/index.html〉. (Accessed Aug 05, 2021).
and opioids. Drug Alcohol Depend. 194, 136–142. SAMHSA, 2020. Key substance use and mental health indicators in the united states:
Meier, M.H., Hall, W., Caspi, A., Belsky, D.W., Cerdá, M., Harrington, H.L., Houts, R., Results from the 2019 national survey on drug use and health. 〈https://www.samh
Poulton, R., Moffitt, T.E., 2016. Which adolescents develop persistent substance sa.gov/data/〉. (Accessed 05 Aug 2021).
dependence in adulthood? Using population-representative longitudinal data to Schulenberg, J.E., Johnston, L.D., O’Malley, P.M., Bachman, J.G., Miech, R.A., Patrick,
inform universal risk assessment. Psychol. Med 46 (4), 877–889. M.E., 2021. Monitoring the future national survey results on drug use, 1975–2019.
Min, J.W., Chang, M.C., Lee, H.K., Hur, M.H., Noh, D.Y., Yoon, J.H., Jung, Y., Yang, J.H., Volume II, college students & adults ages 19–60. 〈http://www.monitoringthefuture.
Society, K.B.C., 2014. Validation of risk assessment models for predicting the org/pubs.html#monographs〉. (Accessed Nov, 21. 2021).
incidence of breast cancer in korean women. J. Breast Cancer 17 (3), 226–235. Spiegelman, D., Colditz, G.A., Hunter, D., Hertzmark, E., 1994. Validation of the Gail
Moss, H.B., Ge, S., Trager, E., Saavedra, M., Yau, M., Ijeaku, I., Deas, D., 2020. Risk for et al. model for predicting individual breast cancer risk. J. Natl. Cancer Inst. 86 (8),
substance use disorders in young adulthood: Associations with developmental 600–607.
experiences of homelessness, foster care, and adverse childhood experiences. Compr. Spindle, T.R., Bonn-Miller, M.O., Vandrey, R., 2019. Changing landscape of cannabis:
Psychiatry 100, 152175. novel products, formulations, and methods of administration. Curr. Opin. Psychol.
Nasir, M., Summerfield, N.S., Oztekin, A., Knight, M., Ackerson, L.K., Carreiro, S., 2021. 30, 98–102.
Machine learning-based outcome prediction and novel hypotheses generation for Stan Development Team, 2020. rstan: R Interface to Stan, 2.21.2 ed.
substance use disorder treatment. J. Am. Med Inf. Assoc. 28 (6), 1216–1224. Tibshirani, R., 1996. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser.
NCDAS, 2018. National Center for Drug Abuse Statistics. 〈https://drugabusestatistics. B Methodol. 58 (1), 267–288.
org/〉. (Accessed 05 Aug 2021). Tibshirani, R., 2011. Regression shrinkage and selection via the lasso: a retrospective.
NIDA, 2017. Trends and Statistics. 〈https://archives.drugabuse.gov/trends-statistics/c J. R. Stat. Soc., Ser. B, Stat. Methodol. 73, 273–282.
osts-substance-abuse〉. (Accessed 05 Aug 2021). Van Erp, S., Oberski, D.L., Mulder, J., 2019. Shrinkage priors for Bayesian penalized
NIDA, 2019. Media Guide: Most Commonly Used Additive Drugs. National Institute on regression. J. Math. Psychol. 89, 31–50.
Drug Abuse. 〈https://www.drugabuse.gov/publications/media-guide/most-common Vehtari, A., Gelman, A., Gabry, J., 2017. Practical Bayesian model evaluation using
ly-used-addictive-drugs〉 (Accessed 23 Sep, 2020). leave-one-out cross-validation and WAIC. Stat. Comput. 27 (5), 1413–1432.
O’Hara, R.B., Sillanpaa, M.J., 2009. A review of Bayesian variable selection methods: Vehtari, A., Gabry, J., Magnusson, M., Yao, Y., Bürkner, P., Gelman, A., 2020. loo:
What, how and which. Bayesian Anal. 4 (1), 85–117. Efficient leave-one-out cross-validation and WAIC for Bayesian models. Statistics and
Park, T., Casella, G., 2008. The Bayesian Lasso. J. Am. Stat. Assoc. 103 (482), 681–686. Computing.
R Core Team, 2019. R: A language and environment for statistical computing. R Verdejo-García, A., Lawrence, A.J., Clark, L., 2008. Impulsivity as a vulnerability marker
Foundation for Statistical Computing, Vienna, Austria. for substance-use disorders: review of findings from high-risk research, problem
Rajapaksha, R.M.D.S., Hammonds, R., Filbey, F., Choudhary, P.K., Biswas, S., 2020. gamblers and genetic association studies. Neurosci. Biobehav Rev. 32 (4), 777–810.
A preliminary risk prediction model for cannabis use disorder. Prev. Med Rep. 20, Walsh, D., McCartney, G., Smith, M., Armour, G., 2019. Relationship between childhood
101228. socioeconomic position and adverse childhood experiences (ACEs): a systematic
Richmond-Rakerd, L.S., Fleming, K.A., Slutske, W.S., 2016. Investigating progression in review. J. Epidemiol. Community Health 73 (12), 1087–1093.
substance use initiation using a discrete-time multiple event process survival mixture Zhang-James, Y., Chen, Q., Kuja-Halkola, R., Lichtenstein, P., Larsson, H., Faraone, S.V.,
(MEPSUM) approach. Clin. Psychol. Sci. 4 (2), 167–182. 2020. Machine-learning prediction of comorbid substance use disorders in ADHD
Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.C., Müller, M., 2011. youth using Swedish registry data. J. Child Psychol. Psychiatry 61 (12), 1370–1379.
pROC: An open-source package for R and S+ to analyze and compare ROC curves. Zoboroski, L., Wagner, T., Langhals, B., 2021. Classical and neural network machine
BMC Bioinforma. 12, 77. learning to determine the risk of marijuana use. Int J. Environ. Res Public Health 18
(14), 7466.