Professional Documents
Culture Documents
1
2
Bioinformatics, YYYY, 0–0
3
doi: 10.1093/bioinformatics/xxxxx
4
Advance Access Publication Date: DD Month YYYY
5
Manuscript Category
6
7
Y.Nam et al.
1
2
3
mechanisms, incorporating this directional information can improve the
4 ability to predict comorbidity. For example, cardiovascular diseases have
5 1 Introduction
a synergistic association with low-density lipoprotein cholesterol (LDL-
6 Comorbidity describe when a given patient has one or more additional C) and an antagonistic association with high-density lipoprotein
7 medical conditions co-occurring alongside a primary disease (Lee, et al., cholesterol (HDL-C): high LDL-C may increase the risk of cardiovascular
𝑤𝑖𝑗 = = (1)
47 ∑𝐾 𝑟2𝑖𝑘
𝑘=1
∑𝐾 𝑟2𝑗𝑘
𝑘=1 ∑𝐾
𝑘=1
(𝛽𝑖𝑘
SE𝑖𝑘)
2
∑𝐾
𝑘=1
(𝛽𝑖𝑘
SE𝑖𝑘)
2
48
where 𝑟𝑖𝑘 and 𝑟𝑗𝑘 are the respective z-scores associated with SNP 𝑘 for
49 diseases 𝒗𝒊 and 𝒗𝒋; 𝐾 is the total number of SNPs; and 𝑆 is the set of
50 significant SNPs shared between 𝒗𝒊 and 𝒗𝒋. Since the z-score 𝑟𝑖 can take a
51 negative value, each edge weight 𝑤𝑖𝑗 can range from -1 to 1. Fig. 2a
52 illustrates the constructed disease similarity matrix (𝑾) with both positive
53 and negative associations. The magnitude of a weight (|𝑤𝑖𝑗|) between a
54 disease pair describes the number of significant SNPs those diseases share.
55 A synergistic association (w𝑖𝑗 > 0) means that two diseases have shared
56 SNPs and the overall effect of those SNPs has the same direction for both
57 diseases. An antagonistic association (w𝑖𝑗 < 0) indicates that the shared
58
59
60
Page 5 of 10 Bioinformatics
Y.Nam et al.
1
2
SNPs have overall opposite directions of effect for the two diseases. The The following describes the procedure of comorbidity scoring.
3 similarity matrix can quantify how similar two diseases are in terms of the Consider a signed DDN 𝑮 = (𝑽,𝑾) where the similarity matrix 𝑾 have
4 directions of SNP effects.
T
both positive and negative edge weights. Let 𝒚 = (𝑦1,…,𝑦𝑚) denote the
5 initial labels for the set of diseases and 𝒇 = (𝑓 1, …, 𝑓𝑚 ) T
the set of
6 predicted comorbidity scores. We set a unary label (𝑦𝑙 = +1) for an index
7 2.2 Network-based comorbidity scoring algorithms
39 Once the DDN incorporating synergistic and antagonistic associations disease of interest (𝑣𝑙) and set unlabeled (𝒚\𝑦𝑙 = {0}) for remaining other
40 (signed DDN) was constructed, comorbidity scoring was performed using diseases. Label information on the labeled disease is propagated to
41 a graph-based SSL, which is employed for scoring algorithms (Chong, et unlabeled diseases on graph 𝑮 to obtain real-valued scores 𝒇, with two
42 al., 2020; Subramanya and Talukdar, 2014). Fig. 2b describes the problem assumptions: the smoothness condition (predicted scores 𝑓𝑖 should not be
43 setup for predicting comorbidity scores. The network-based scoring too different from the 𝑓𝑗 values in adjacent unlabeled nodes) and the loss
44 algorithm used here is a transductive learning approach (Chong, et al., condition (predicted scores 𝑓𝑖 should be close to the given label of 𝑦𝑖).
45 2020); it predicts the co-occurrences between an underlying disease and Generally, for an unsigned graph, 𝑮 = (𝑽, 𝑾), the smoothness condition
46 other diseases when only the underlying disease is known to occur (one is represented as
positive sample), and all others have unknown occurrences (unlabeled
47
samples). Those scores can provide prioritized comorbidity scores for the 𝒇𝐓𝑳𝒇 = ∑𝑖~𝑗𝑤𝑖𝑗(𝑓𝑖 ― 𝑓𝑗)2 (2)
48
unlabeled diseases by propagating with disease associations from the
49 signed DDN. The proposed network simultaneously contains both positive where the graph Laplacian 𝑳 is defined as 𝑳 = 𝑫 ― 𝑾, 𝑾 being the
50 and negative edges, obtained from (1); that is, two diseases having SNPs unsigned similarity matrix, 𝑫 = diag(𝑑𝑖) the degree matrix, and 𝑑𝑖 =
51 in common can co-occur regardless of the respective directionality of SNP ∑ 𝑤𝑖𝑗. However, since the proposed network has signed edges from (1),
𝑗
52 effects. Nevertheless, the chance of co-occurrence for two synergistically we modified the smoothness condition of (2) to handle both positive and
53 associated diseases may be relatively higher than that of two diseases negative associations by incorporating a signed degree matrix:
54 having an antagonistic association, since the former pair shares many
55 SNPs with the same direction of effect while the latter features SNPs with 𝒇𝐓𝑳𝒇 = ∑𝑖~𝑗|𝑤𝑖𝑗|(𝑓𝑖 ― sign(𝑤𝑖𝑗)𝑓𝑗)2 (3)
56 opposite directions of effect.
57
58
59
60
Bioinformatics Page 6 of 10
Y.Nam et al.
1
2
57 have only synergistic associations, 3 have only antagonistic sharing of SNPs and more consistent direction of SNP effects; thus,
3 associations, and remaining three diseases have no associations diseases having stronger synergistic association with a given index disease
4 (Supplementary Data S2, have higher predicted comorbidity scores. To test this hypothesis, we
5 https://hdpm.biomedinfolab.com/ddn/signedDDN). To investigate a compared prediction performance between the proposed method (signed
6 deeper biological interpretations of the network, we described the DDN) and the baseline method (unsigned DDN). The unsigned DDN was
7 composition of phenotypes and interactions in signed DDN for coronary constructed as in previous studies (Goh, et al., 2007; Nam, et al., 2019;
Y.Nam et al.
1
2
(synergistic/antagonistic association) provide important information for coronary atherosclerosis and the predicted comorbidity was inverse. The
3 comorbidity prediction. scoring algorithm with the signed DDN detected the known relationships
4 that high levels of HDL-C were associated with low risk of coronary artery
5 disease and associated with high risk of age-related macular degeneration
6 (Fan, et al., 2017). A sub-network consisting of all diseases directly
7 3.5 Clinical implication of comorbidity scoring based on the connected with coronary atherosclerosis is given in Supplementary Fig.
Y.Nam et al.
1
2
References Roitmann, E., Eriksson, R. and Brunak, S. (2014) Patient stratification and
3 identification of adverse event correlations in the space of 1190 drug related
Auton, A., et al. (2015) A global reference for human genetic variation, Nature, 526,
4 68-74.
adverse events, Frontiers in physiology, 5, 332.
Rubio-Perez, C., et al. (2017) Genetic and functional characterization of disease
5 Barabási, A.-L., Gulbahce, N. and Loscalzo, J. (2011) Network medicine: a network- associations explains comorbidity, Scientific reports, 7, 1-14.
6 based approach to human disease, Nature reviews genetics, 12, 56-68. Sánchez-Valle, J., Tejero, H. and Fernández, J.M. (2020) Interpreting molecular
Buddeke, J., et al. (2019) Comorbidity in patients with cardiovascular disease in
7 primary care: a cohort study with routine healthcare data, British Journal of
similarity between patients as a determinant of disease comorbidity