Professional Documents
Culture Documents
Review Article
AI in Medicine
Jeffrey M. Drazen, M.D., Editor, Isaac S. Kohane, M.D., Ph.D., Guest Editor,
and Tze‑Yun Leong, Ph.D., Guest Editor
N
From the Departments of Medicine, Ge ew methods such as genomic sequencing and mass spectrome-
netics, and Biomedical Data Science, try have prompted dramatic increases in the amount of molecular data
Stanford University, Stanford, CA (B.G.,
E.A.A.); and the Department of Cardiol available to scientists and health care professionals seeking more refined
ogy, Pneumology, and Angiology, Heidel diagnoses and increased therapeutic precision.1 Although the largest advances
berg University Hospital, Heidelberg, have been made in genetic sequencing of DNA and RNA, medical applications of
Germany (B.G.). Dr. Ashley can be con
tacted at euan@stanford.edu or at Stan high-dimensional measurement of proteins and metabolites are increasing.
ford University, Falk Bldg., 870 Quarry Analytic tools have been improved in parallel to match the volume, velocity, and
Rd., Stanford, CA 94304. variety of these molecular “big data.” The emergence of machine learning has
N Engl J Med 2023;388:2456-65. proved especially valuable. In these approaches, computer systems use large
DOI: 10.1056/NEJMra2204787 amounts of data to build predictive statistical models that are iteratively improved
Copyright © 2023 Massachusetts Medical Society.
by incorporating new data. Deep learning, a powerful subset of machine learning
that includes the use of deep neural networks, has had high-profile applications
in image object recognition,2 voice recognition, autonomous driving, and virtual
assistance. These approaches are now being applied in medicine to yield clinically
directive medical information. In this review article, we briefly describe the meth-
ods used to generate high-dimensional molecular data and then focus on the key
role that machine learning plays in the clinical application of such data.
where the genome being analyzed differs from sion and image recognition.7 Other approaches
the reference sequence. This results in a variant to variant calling use machine learning in nar-
call file — a text file typically 3 million to 4 rower applications, such as for the technical
million lines in length and a few megabytes in calibration of error profiles for specific variants
size. To prioritize the variants in the file accord- or regions of the genome.8
ing to, for example, their likelihood of causing a Deep neural networks9 are complex, nonlinear
rare disease in a patient, filtering or machine- functions fit to large data sets. Multiple layers of
learning approaches are used.6 For RNA sequenc- alternating “neuron” weights and nonlinearities
ing, after mapping, most applications focus on transform the data into abstract and lower-dimen-
quantification of gene or isoform expression sional representations that are useful for classi-
rather than on sequence identity, converting read fication. Layers are connected through an activa-
counts per gene or isoform to a standardized tion function, which acts as a gatekeeper to the
quantitative measure. further propagation of individual outputs. In
In contrast, mass spectrometry is the work- image tasks, pooling functions are used to down-
horse of proteomics (the study of all proteins in scale inputs over specific areas. Neuron weights
a cell) and metabolomics (the study of chemical are then refined through a process known as
processes involving metabolites in cell metabo- back-propagation, and a final classification is
lism). It generates ions by bombarding organic usually in the form of a confidence estimate for
or inorganic compounds with electrons and each of several output options. Convolutional
separating the resulting positively charged frag- neural networks are a specific form of deep neu-
ments by their mass-to-charge ratio. The first stage ral networks, often used for image recognition,
of mass spectrometry often involves a separation that are characterized by the process of sliding
phase such as liquid chromatography, followed filters over the image input (Figs. 2 and 3).
by a spectrometry phase. The output is in the With the power of neural networks and the
form of spectral plots of ion signal as a function ability to read much longer molecules of DNA, a
of mass-to-charge ratio. These output plots usu- new era in haplotyping (the mapping of DNA
ally represent superimposed signals from at least strands to the parental chromosome of origin)
hundreds of chemical entities that need to be could emerge. The haplotyping approach improves
decomposed into individual signals, mapped by the quality of variant calling by better representing
reference to large databases of spectra from the originating DNA molecules and can inform
known chemical entities, and then further pro- clinical management — for example, in the case
cessed. This processing might include, for ex- of compound heterozygosity, when identification
ample, reassembling peptide fragments into full- of the parent of origin of two variants at the same
length proteins. locus can affect patient care. Recently, unprece-
dented accuracy has been achieved with the use
of an approach to variant calling that combines
M achine-L e a r ning A ppl ic at ions
in Genomic s haplotyping with models optimized for sequen-
tial data, followed by the convolutional neural
The most important advances in the application network approach described above (https://github
of machine learning to genomics (the study of .com/google/deepvariant/releases/t ag/v1.5.0).17
the set of genes [the genome] in a cell) have oc- The improvement in variant calling resulting
curred in variant calling: the process of deter- from these advances has been facilitated by the
mining where the analyte sequence (e.g., a sample National Institute of Standards and Technology
from a patient) varies from the reference se- through its Genome in a Bottle Consortium and
quence. As individual reads are mapped to the by the Food and Drug Administration (FDA)
corresponding position in the reference genome, through the precisionFDA initiative. Together,
they can be visualized as a “pile up” in which these groups run open “Truth Challenge” com-
bases distinct from the reference are highlighted petitions with standardized samples. Results re-
(Fig. 1). This visual representation facilitates veal continuing improvements in the accuracy of
rapid manual review in complex areas of the variant calling genomewide and specifically in
genome, an insight that led to the development challenging areas of the genome, such as regions
of a deep-learning approach to variant identifi- encoding the major histocompatibility complex
cation that draws on advances in computer vi- on chromosome 6. F1 accuracy scores, which
CH3
CH3
CH3
Cycle 3
Detector
Synthesis Nanopore SMRT Mass spectrometry
Intensity
Reference
genome
Read 4 t4
t
ATCTGACTCCTGTGGAGAACGGCCTGAT
Read 2 C T G TACACAGGC
Liq R 3 m/z tio
ATCTGACTCCTGTGGAGATGGG
Read 1 GATGACTACACAGCC
Read 3 uid etentio t2 t e ra
chr n ti 1 - c harg try for
ATCAGACTCCTATGGAGAACGGGGAC T TC ACTACAGT GCC for oma me s-t o e
sep tog Mas pectromysis
COMPUTATION
CGGGGAGTTGACTACACAGCC
CGGGGAGTTGACTACACAGCC
CGGGGAGTTGACTACACAGCC
Reads CGGGGAGTTGACTACACAGCC
CGGGGAGTTGACTACACAGCC Research variants...
CGGGGAGTTGACTACACAGCC Protein and
CGGGGAGTTGAC metabolite
identification
Reference CGGGGAGATGACTACACAGCC
genome
Amino Arg Gly Asp Asp Tyr Asn Ala
acids Label A Label B
take both false positive and false negative results higher than 0.998 have now been reported for
into account, range between 0, for the poorest the three most commonly applied forms of se-
outcome, and 1.0, for the best outcome. Scores quencing technology (with both short- and long-
Artificial Intelligence
Machine learning
O Feed-forward
Encoders neural network Transformer models
Deep neural networks that use
Feed-forward Multi-head
neural network self-attention mechanisms to
attention capture relationships between
Multi-head different elements in a sequence
attention Self-attention
I Decoders
Figure 2 (facing page). Machine Learning for Biomedical methods to call not just the nucleotide at that
Applications. site but any of a series of chemical modifications
Commonly used machine-learning approaches are shown. of that nucleotide. These approaches do not re-
In random forest models, decision trees are applied to quire the bisulfite conversion of the previous
iterative subsets of the data. Support-vector machines standard, which has been shown to cause DNA
separate subgroups according to a hyperplane (separa
fragmentation.34 Because of a critical role in
tion) in n-dimensional space. Neural networks, learn
ing models inspired by the organization of the human tissue-specific transcription, most attention has
brain, comprise nodes called neurons. Deep learning focused on the addition of a methyl group to the
has been particularly effective in image recognition, C5 position of cytosine residues in sequential
especially through convolutional neural networks. For CG dinucleotide sequences called CpG sites. Ap-
sequential data, recurrent neural networks (including
proaches involving the use of a range of neural
long short-term memory [LSTM] units) allow the net
work to retain memory of previous input data. Since networks, including convolutional neural net-
the networks perform less well with longer input, sci works,35 bidirectional recurrent neural networks
entists have increasingly adopted transformer10,11 and (Fig. 3),14 and a combination of the two types,36
large language models12 that ingest sequential data as a have achieved a C statistic of more than 0.95 for
whole, prioritizing specific areas for “attention” and in
the detection of methylation, outperforming
corporating weights from input data found both earlier
and later in the data sequence. In a transformer model, previous benchmark models.36
the self-attention component assigns weights to the
elements in a sequence in order to define their impor
tance within the context. Multi-head attention builds
M achine L e a r ning for
on this idea by incorporating multiple sets of attention
Pro teomic s
to capture diverse patterns and aspects of the sequence.
Deep learning has facilitated substantial prog-
ress in almost all parts of the proteomics work-
tion before decoding it back to a representation flow.37 With training on patterns of spectral
of the original input, has been shown to improve plots from known chemical entities, deep-learn-
aberrant splicing prediction from RNA sequenc- ing approaches have improved the prediction of
ing data (Fig. 2).13 spectra of candidate peptides,38-40 which is a
These approaches were used in the case of a pivotal step for tandem mass spectrometry–based
12-year-old girl with developmental regression, proteomics. With the use of a bidirectional, long
tremors, and seizures. Short-read genomic se- short-term memory, deep-learning approach, in
quencing identified 96 candidate gene variants, which a neural network propagates information
none of which appeared to be responsible for the sequentially forward and backward across an
patient’s condition. The addition of a splice- input signal, one algorithm38 predicts peptides
outlier algorithm based on RNA sequencing of with a Pearson correlation coefficient greater
the patient’s blood identified a splice-gain vari- than 0.9, surpassing the previous machine learn-
ant in KCTD7, which was not in the original list, ing–based standard of 0.85.41 Peptide retention
establishing the diagnosis of progressive myoc- time, which is the point in time when a peptide
lonus epilepsy. is eluted from the liquid chromatography col-
umn, has also been predicted accurately with the
use of convolutional neural network–based tools.42
Epigenomic A ppl ic at ions
Apart from mass spectrometry, de novo peptide
Epigenomics is defined as a complete set of sequencing43 and protein identification have been
modifications that influence gene expression. the focus of deep-learning applications that use
Although epigenetic mechanisms are known to both convolutional neural networks and long
play a role in certain rare and common disease short-term memory approaches. One tool out-
presentations, characterization of chemical modi- performed benchmark peptide-sequencing tools,
fications of DNA at scale is only beginning to with a C statistic that was 33% higher than a
have an impact on clinical medicine. Long-read previous standard,44 and another transformer-
sequencing methods present an exciting oppor- based tool showed sequence coverage of 97.7 to
tunity, since they produce signals when the nucle- 99.5%.10,11 Moreover, large language models have
otide passes through a protein nanopore32 or as a recently been applied to protein-function predic-
base is incorporated by DNA polymerase.33 These tion, with the aim of accelerating drug discovery.12
signals can be interpreted by machine-learning Post-translational modification of proteins in
Genomics Transcriptomics
Example: variant calling Example: alternative splicing
C
CGGGGAGTTGCCTACAAAGCCTTAC
CGGGGAGTTGACTACAAAGCCTTAC
Reads
CGGGGAGTTGCCTACAAAGCCTTAC
C
CGGGGAGTTGCCTACAAAGCCTTAC
C
CGGGGAGTTGACTACAAAGCCTTAC
Epigenomics Proteomics
Example: methylation detection Multi-omics Example: 3D structure
Protein sequence and databases
Reference genome
Ser Gly Leu Cys
Arg Gly Asp Asp Tyr Asn Ala Se
Angle Distance
Input: summary of raw signals predictions predictions
H2
Metabolomics
H3
Example: diagnosis of rare hereditary anemias
Figure 3 (facing page). Applications of Machine Learning which protein quantification was achieved with
to Omics Data. the use of a panel of aptamers (oligonucleotides
Variant calling can be regarded as an image-classifica that bind to proteins), a series of machine-
tion problem (https://github.com/google/deepvariant/ learning models, including logistic regression–
releases/tag/v1.5.0), whereas alternative splicing can based models and random forests (Figs. 2 and 3),
be predicted through an auto-encoder network.13 In the
example of variant calling, the sequence data, quality
were trained to predict 11 different indicators of
scores, and other read features are encoded into a multi health that are commonly used for preventive
channel feature representation. This feature representa medicine (e.g., the 5-year risk of a primary car-
tion is then fed into a convolutional neural network to diovascular event) in a panel of approximately
calculate genotype likelihoods for three genotype states: 17,000 persons with no major illnesses, from
homozygous reference, heterozygous, or homozygous
alternate. In the example shown, a heterozygous variant
five independent cohort studies.51 Quantification
is identified as the most probable genotype. Prediction of 94 proteins predicted liver fat with a C statis-
of methylation has benefited from bidirectional recur tic of 0.83 in a validation cohort,51 suggesting
rent neural networks.14 Deep-learning applications are potential near-term application for noninvasive
increasing the accuracy of predictions of three-dimen
detection of nonalcoholic fatty liver disease. A
sional (3D) protein structures.15 Support-vector machines
in untargeted metabolomics have shown promise in machine learning–assisted proteomics approach
the diagnosis of rare hereditary anemias.16 Applications has also identified circulating biomarkers for
of learning models to combined multi-omic inputs rep alcoholic liver disease, Alzheimer’s disease, and
resent the next frontier in the pursuit of precision med Parkinson’s disease.52
icine. AI denotes artificial intelligence, and ReLu recti
fied linear unit.
A ppl ic at ions for Me ta bol omic s
Metabolomics focuses on the dynamics of the
acetylation46 and ubiquitination.47 Predicting pro- entire set of small molecules of an organism.53
tein function from a peptide sequence has also As compared with proteomics, which is focused
recently been improved with a combination of on the protein complement, metabolomics in-
machine-learning approaches — namely, hidden cludes measurements of fatty acids, lipids, organic
Markov models and an ensemble of convolu- acids, amino acids, steroids, and carbohydrates.
tional neural networks.48 The combined ap- One of the central clinical applications of me-
proaches contributed functional predictions for tabolomics is the diagnosis of inborn errors of
360 previously unannotated human reference pro- metabolism. Classically, the quantification of
teome proteins, expanding coverage of the stan- specific classes of metabolites such as purines
dard protein family database by more than 9%. and amino acids is undertaken with the use of
In a high-profile application of deep learning individual assays, with the main limitation be-
to proteomics, neural network–based AlphaFold49 ing a priori assumptions regarding the potential
(Fig. 3) won the 13th and 14th Critical Assess- ly affected pathways. Mass spectrometry–based
ment of Protein Structure Prediction competi- metabolomics, in contrast, can be combined
tions (specifically, AlphaFold1 won the CASP13 with genomic sequencing as an untargeted strat-
competition and AlphaFold2 won the CASP14 egy to address the low diagnostic rate among
competition). These were biennial, blinded com- patients presenting with typical signs of inborn
petitions to benchmark progress in protein struc- errors of metabolism but with negative results of
ture prediction. In the 13th competition, Alpha- standard screening. Untargeted metabolomics
Fold1 created high-accuracy structures for 24 of led to an increase in the diagnostic rate by a fac-
43 free modeling domains, greatly surpassing tor of 6 in one cross-sectional analysis54 and has
both previous approaches and the next best been shown to be a useful strategy in targeting
method, which achieved similar accuracy for deficiencies in the nonoxidative pentose phos-
only 14 of 43 domains.50 In the CASP14 competi- phate pathway.55 In a recent study, exome se-
tion, AlphaFold2 built on this progress, outper- quencing combined with metabolomics improved
forming many competing models.15 variant classification.56 For example, a metabolic
The prediction of biomarkers has been a prin- fingerprint approach established a diagnosis of
cipal clinical focus for proteomics in recent years. pyruvate kinase deficiency with the use of a sup-
Research has been directed at both single-marker port vector machine, which identifies subgroups
and multimarker discovery. In one study, in by finding a hyperplane in n-dimensional space
(Figs. 2 and 3).16 In another example, variants in of previously published machine-learning mod-
genes for metalloproteins provided the training els, including a neural network.61 By suggesting
data for a multichannel convolutional neural that increased expression of LZTFL1 might be
network, which showed that mutations in the associated with a worse outcome, this insight
iron-binding site of metalloproteins were more reveals novel candidate targets for the preven-
closely associated with metabolic diseases than tion and treatment of Covid-19. Novel biomark-
were mutations at other locations.57 ers of the response to immunotherapy have also
been revealed through analysis of genomic, tran-
scriptomic, and immunomic response data in
Mult i- omic A ppl ic at ions
cancer with the use of a support-vector machine.62
As high-dimensional data from multiple types of
technology are more readily available, computa- C onclusions
tional approaches to combining data become
more important. One of the earliest examples of Over the past decade, technological advances have
a multi-omic study (i.e., an approach integrating greatly enhanced our ability to measure funda-
multiple “omes” data types such as the genome mental biologic processes at scale. The resulting
or proteome) was a longitudinal analysis involv- volume of data has been met with machine-
ing a single person that combined genomic, tran- learning methods that are increasingly tuned for
scriptomic, proteomic, metabolomic, and auto- the analysis of multidimensional biologic data
antibody profiles.58 Others have since used a sets. The outcome is a progressively detailed
multi-omic approach to build a correlation net- understanding of the molecular trajectory of
work reflecting health and disease states and by disease that is now finding application in clini-
so doing have proposed novel biomarkers for cal medicine, with the greatest progress having
cardiometabolic disease.59 Other integrative ap- been made in the diagnosis, and in some cases
proaches making use of deep learning have also treatment, of rare genetic diseases. Challenges
been reported. These approaches either fuse the remain, including data quality, data consistency,
data early, concatenating omics data and then and clinician awareness. However, as single-
performing a single analysis, or fuse the data omic discovery gives way to multi-omic applica-
later, creating a joint model that combines out- tion, standardization of pipelines, expansion of
put from several single omic analyses.60 Some benchmark metrics, and acceleration in the
multi-omic approaches have proved successful in speed and accuracy of data processing will en-
the clinical arena, such as in the identification of sure that the potential for a far-reaching impact
leucine zipper transcription factor–like 1 (LZTFL1) on precision health care is realized.
as a candidate effector gene at a coronavirus Disclosure forms provided by the authors are available with
disease 2019 (Covid-19) risk locus, with the use the full text of this article at NEJM.org.
References
1. Ashley EA, Butte AJ, Wheeler MT, et al. medicine: barriers and facilitators of fu- etry peptide sequencing with a transform-
Clinical assessment incorporating a per- ture progress in research and clinical appli- er model. Proceedings of the 39th Inter-
sonal genome. Lancet 2010; 375:1525- cation. Brief Bioinform 2019;20:1795-811. national Conference on Machine Learning,
35. 7. Poplin R, Chang P-C, Alexander D, 2022 (https://proceedings.mlr.press/v162/
2. Ouyang D, He B, Ghorbani A, et al. et al. A universal SNP and small-indel yilmaz22a/yilmaz22a.pdf).
Video-based AI for beat-to-beat assess- variant caller using deep neural networks. 12. Stern A. NVIDIA expands large lan-
ment of cardiac function. Nature 2020; Nat Biotechnol 2018;36:983-7. guage models to biology. Santa Clara, CA:
580:252-6. 8. DePristo MA, Banks E, Poplin R, et al. NVIDIA, September 20, 2022 (https://blogs
3. Nurk S, Koren S, Rhie A, et al. The A framework for variation discovery and .nvidia.com/blog/2022/09/20/bionemo
complete sequence of a human genome. genotyping using next-generation DNA se- -large-language-models-drug-discovery/).
Science 2022;376:44-53. quencing data. Nat Genet 2011;43:491-8. 13. Mertes C, Scheller IF, Yépez VA, et al.
4. Gorzynski JE, Goenka SD, Shafin K, 9. Goodfellow I, Bengio Y, Courville A. Detection of aberrant splicing events in
et al. Ultrarapid nanopore genome se- Deep learning. Cambridge, MA:MIT Press, RNA-seq data using FRASER. Nat Com-
quencing in a critical care setting. N Engl 2016. mun 2021;12:529.
J Med 2022;386:700-2. 10. Beslic D, Tscheuschner G, Renard BY, 14. Liu Q, Fang L, Yu G, Wang D, Xiao
5. Li H, Durbin R. Inference of human Weller MG, Muth T. Comprehensive eval- C-L, Wang K. Detection of DNA base
population history from individual whole- uation of peptide de novo sequencing modifications by deep recurrent neural
genome sequences. Nature 2011;475:493-6. tools for monoclonal antibody assembly. network on Oxford Nanopore sequencing
6. Lightbody G, Haberland V, Browne F, Brief Bioinform 2023;24:bbac542. data. Nat Commun 2019;10:2449.
et al. Review of applications of high- 11. Yilmaz M, Fondrie W, Bittremieux W, 15. Jumper J, Evans R, Pritzel A, et al.
throughput sequencing in personalized Oh S, Noble WS. De novo mass spectrom- Highly accurate protein structure predic-
tion with AlphaFold. Nature 2021; 596: Transcriptomic signatures across human 47. Fu H, Yang Y, Wang X, Wang H, Xu Y.
583-9. tissues identify functional rare genetic vari- DeepUbi: a deep learning framework for
16. Van Dooijeweert B, Broeks MH, Ver- ation. Science 2020;369(6509):eaaz5900. prediction of ubiquitination sites in pro-
hoeven-Duif NM, et al. Untargeted meta- 31. Jaganathan K, Kyriazopoulou Panagi- teins. BMC Bioinformatics 2019;20:86.
bolic profiling in dried blood spots iden- otopoulou S, McRae JF, et al. Predicting 48. Bileschi ML, Belanger D, Bryant DH,
tifies disease fingerprint for pyruvate splicing from primary sequence with deep et al. Using deep learning to annotate the
kinase deficiency. Haematologica 2021; learning. Cell 2019;176(6):535-548.e24. protein universe. Nat Biotechnol 2022;40:
106:2720-5. 32. Bowden R, Davies RW, Heger A, et al. 932-7.
17. Shafin K, Pesout T, Chang P-C, et al. Sequencing of human genomes with nano- 49. Mancino DJ. Breakthrough to nursing
Haplotype-aware variant calling with pore technology. Nat Commun 2019;10: timeline. Imprint 2010;57:35-41.
PEPPER-Margin-DeepVariant enables high 1869. 50. Senior AW, Evans R, Jumper J, et al.
accuracy in nanopore long-reads. Nat 33. PacBio (https://www.pacb.com/). Improved protein structure prediction us-
Methods 2021;18:1322-32. 34. Feng S, Zhong Z, Wang M, Jacobsen ing potentials from deep learning. Nature
18. Olson ND, Wagner J, McDaniel J, et al. SE. Efficient and accurate determination of 2020;577:706-10.
PrecisionFDA Truth Challenge V2: calling genome-wide DNA methylation patterns in 51. Williams SA, Kivimaki M, Langen-
variants from short and long reads in Arabidopsis thaliana with enzymatic berg C, et al. Plasma protein patterns as
difficult-to-map regions. Cell Genom methyl sequencing. Epigenetics Chroma- comprehensive indicators of health. Nat
2022;2:100129. tin 2020;13:42. Med 2019;25:1851-7.
19. Nicholls HL, John CR, Watson DS, 35. Tse OYO, Jiang P, Cheng SH, et al. 52. Mann M, Kumar C, Zeng W-F, Strauss
Munroe PB, Barnes MR, Cabrera CP. Genome-wide detection of cytosine meth- MT. Artificial intelligence for proteomics
Reaching the end-game for GWAS: ma- ylation by single molecule real-time se- and biomarker discovery. Cell Syst 2021;
chine learning approaches for the priori- quencing. Proc Natl Acad Sci U S A 2021; 12:759-70.
tization of complex disease loci. Front 118(5):e2019768118. 53. Lindon JC, Nicholson JK, Holmes E.
Genet 2020;11:350. 36. Ni P, Huang N, Zhang Z, et al. Deep- The handbook of metabonomics and
20. Birgmeier J, Haeussler M, Deisseroth Signal: detecting DNA methylation state metabolomics. New York:Elsevier, 2011.
CA, et al. AMELIE speeds Mendelian diag- from Nanopore sequencing reads using 54. Liu N, Xiao J, Gijavanekar C, et al. Com-
nosis by matching patient phenotype and deep-learning. Bioinformatics 2019;35: parison of untargeted metabolomic profil-
genotype to primary literature. Sci Transl 4586-95. ing vs traditional metabolic screening to
Med 2020;12(544):eaau9113. 37. Wen B, Zeng W-F, Liao Y, et al. Deep identify inborn errors of metabolism.
21. De La Vega FM, Chowdhury S, Moore learning in proteomics. Proteomics 2020; JAMA Netw Open 2021;4(7):e2114155.
B, et al. Artificial intelligence enables 20(21-22):e1900335. 55. Shayota BJ, Donti TR, Xiao J, et al. Un-
comprehensive genome interpretation and 38. Zhou X-X, Zeng W-F, Chi H, et al. targeted metabolomics as an unbiased
nomination of candidate diagnoses for pDeep: predicting MS/MS spectra of pep- approach to the diagnosis of inborn er-
rare genetic diseases. Genome Med 2021; tides with deep learning. Anal Chem rors of metabolism of the non-oxidative
13:153. 2017;89:12690-7. branch of the pentose phosphate path-
22. Splinter K, Adams DR, Bacino CA, et al. 39. Zeng W-F, Zhou X-X, Zhou W-J, Chi H, way. Mol Genet Metab 2020;131:147-54.
Effect of genetic diagnosis on patients Zhan J, He S-M. MS/MS spectrum predic- 56. Alaimo JT, Glinton KE, Liu N, et al.
with previously undiagnosed disease. tion for modified peptides using pDeep2 Integrated analysis of metabolomic pro-
N Engl J Med 2018;379:2131-9. trained by transfer learning. Anal Chem filing and exome data supplements se-
23. Lee H, Deignan JL, Dorrani N, et al. 2019;91:9724-31. quence variant interpretation, classifica-
Clinical exome sequencing for genetic iden- 40. Liu K, Li S, Wang L, Ye Y, Tang H. tion, and diagnosis. Genet Med 2020;22:
tification of rare Mendelian disorders. Full-spectrum prediction of peptides tan- 1560-6.
JAMA 2014;312:1880-7. dem mass spectra using deep neural net- 57. Koohi-Moghadam M, Wang H, Wang
24. Wright CF, Campbell P, Eberhardt RY, work. Anal Chem 2020;92:4275-83. Y, et al. Predicting disease-associated mu-
et al. Genomic diagnosis of rare pediatric 41. Li S, Arnold RJ, Tang H, Radivojac P. tation of metal-binding sites in proteins
disease in the United Kingdom and Ire- On the accuracy and limits of peptide using a deep learning approach. Nature
land. N Engl J Med 2023;388:1559-71. fragmentation spectrum prediction. Anal Machine Intelligence 2019;1:561-7.
25. Dewey FE, Grove ME, Pan C, et al. Chem 2011;83:790-6. 58. Chen R, Mias GI, Li-Pook-Than J, et al.
Clinical interpretation and implications 42. Bouwmeester R, Gabriels R, Hulstaert Personal omics profiling reveals dynamic
of whole-genome sequencing. JAMA 2014; N, Martens L, Degroeve S. DeepLC can molecular and medical phenotypes. Cell
311:1035-45. predict retention times for peptides that 2012;148:1293-307.
26. Park CY, Zhou J, Wong AK, et al. Ge- carry as-yet unseen modifications. Nat 59. Price ND, Magis AT, Earls JC, et al. A
nome-wide landscape of RNA-binding Methods 2021;18:1363-9. wellness study of 108 individuals using
protein target site dysregulation reveals a 43. Sinitcyn P, Richards AL, Weatheritt personal, dense, dynamic data clouds.
major impact on psychiatric disorder risk. RJ, et al. Global detection of human vari- Nat Biotechnol 2017;35:747-56.
Nat Genet 2021;53:166-73. ants and isoforms by deep proteome se- 60. Reel PS, Reel S, Pearson E, Trucco E,
27. Li X, Battle A, Karczewski KJ, et al. quencing. Nat Biotechnol 2023. Jefferson E. Using machine learning ap-
Transcriptome sequencing of a large hu- 44. Tran NH, Zhang X, Xin L, Shan B, Li proaches for multi-omics data analysis:
man family identifies the impact of rare M. De novo peptide sequencing by deep a review. Biotechnol Adv 2021;49:107739.
noncoding variants. Am J Hum Genet learning. Proc Natl Acad Sci U S A 2017; 61. Downes DJ, Cross AR, Hua P, et al.
2014;95:245-56. 114:8247-52. Identification of LZTFL1 as a candidate
28. Li X, Kim Y, Tsang EK, et al. The im- 45. Wang Y-C, Peterson SE, Loring JF. effector gene at a COVID-19 risk locus.
pact of rare variation on gene expression Protein post-translational modifications Nat Genet 2021;53:1606-15.
across tissues. Nature 2017;550:239-43. and regulation of pluripotency in human 62. Chen S, Lai H, Zhao J, et al. The viral
29. Frésard L, Smail C, Ferraro NM, et al. stem cells. Cell Res 2014;24:143-60. expression and immune status in human
Identification of rare-disease genes using 46. Zhao X, Li J, Wang R, He F, Yue L, Yin cancers and insights into novel biomark-
blood transcriptome sequencing and large M. General and species-specific lysine ers of immunotherapy. BMC Cancer 2021;
control cohorts. Nat Med 2019;25:911-9. acetylation site prediction using a bi-modal 21:1183.
30. Ferraro NM, Strober BJ, Einson J, et al. deep architecture. IEEE 2018;6:63560-9. Copyright © 2023 Massachusetts Medical Society.