You are on page 1of 10

Applications of Deep

Learning in Endocrine
Neoplasms
Siddhi Ramesh, BAa,c, James M. Dolezal, MDa,c,
Alexander T. Pearson, MD, PhDa,b,c,*

KEYWORDS
 Machine learning  Deep learning  Endocrine neoplasia  Pathology  Histology

Key points
 Deep learning applications in histopathology have demonstrated the ability to enhance automation
to reduce pathologist workloads and to abstract image-related features that are indeterminable from
pure human inspection.
 Centralized data repositories must be an emphasis to improve data access and improve the quality
and reliability of histopathology-related deep learning studies.
 Standardized reporting and evaluation criteria must be established to improve study interpretability
and comparability to ultimately improve clinical adoption of deep learning models.

ABSTRACT medicine has led to significant momentum and in-

M
terest in developing new methods for data anal-
achine learning methods have been ysis. The use of artificial intelligence (AI) and
growing in prominence across all areas machine learning (ML) methodologies have
of medicine. In pathology, recent ad- continued to grow across many areas within med-
vances in deep learning (DL) have enabled compu- icine. Specific methodologies such as deep
tational analysis of histological samples, aiding in learning (DL) now enable the efficient analysis of
diagnosis and characterization in multiple disease complex data to solve issues with large-scale im-
areas. In cancer, and particularly endocrine can- age classification, free text natural language pro-
cer, DL approaches have been shown to be useful cessing, and signal processing.1
in tasks ranging from tumor grading to gene In medicine, diabetic retinopathy image inter-
expression prediction. This review summarizes pretation,2 electrocardiogram analysis,3 and
the current state of DL research in endocrine can- stroke or intracranial hemorrhage4 are a few exam-
cer histopathology with an emphasis on experi- ples of areas with FDA-approved AI tools. Within
mental design, significant findings, and key oncology, areas such as cancer radiology, radia-
limitations. tion oncology, gynecology-oncology, clinical
oncology, and pathology have seen increasing
OVERVIEW numbers of DL tools undergoing successful FDA
approval, with the largest number of DL tools
In recent years, an exponential increase in data seen in radiology and pathology applications.5
and computational infrastructure across all of However, thus far, these FDA-approved tools are
surgpath.theclinics.com

a
Department of Medicine, Section of Hematology/Oncology, University of Chicago Medical Center, 5841
South Maryland Avenue, MC 2115, Chicago, IL 60637, USA; b University of Chicago Comprehensive Cancer Cen-
ter, Chicago, IL, USA; c The University of Chicago Medicine & Biological Sciences, 5841 South Maryland Avenue,
Chicago, IL, USA
* Corresponding author.
E-mail address: apearson5@medicine.bsd.uchicago.edu

Surgical Pathology 16 (2023) 167–176


https://doi.org/10.1016/j.path.2022.09.014
1875-9181/23/Ó 2022 Elsevier Inc. All rights reserved.
168 Ramesh et al

not viewed as independent substitutes for tradi- WHAT IS MACHINE LEARNING?


tional physician-based diagnostic processes.
Instead, they are being used to further augment Broadly, AI is the domain of study utilizing algo-
physician workflows, particularly during diag- rithms to enable a machine to perform tasks with
nosis.5 Outside of FDA-associated studies, there human-like intelligence (Fig. 1). ML is a subset of
have been numerous studies of the application of AI encompassing statistical methods that identify
DL in areas outside of diagnostics such as prog- of patterns and trends within datasets and enable
nosis and treatment response.6 The cancer sub- generation of predictions for novel data. Neural
types that are experiencing the most growth and networks (NNs) are ML algorithms that use a series
focus within DL research include breast, lung, of connected neurons that function in cohesion to
and prostate, which may be attributable to their identify patterns, loosely resembling the way inter-
relatively higher incidence compared with other connected neurons function within the brain. NNs
cancer subtypes.6 are trained on example data through pattern
In this review, we discuss the application of DL recognition using feedback process known as
methods to histopathologic assessment of endo- backpropagation, where the network compares
crine neoplasms, including thyroid, pancreatic, the output it generates to the value that is known
neuroendocrine, parathyroid, pituitary, and adre- to be correct, ultimately using the difference be-
nal tumors. We will review key studies that char- tween these values to modify the connections
acterize the state of pathology-related DL among the units in the network. DL is a methodol-
studies within these endocrine neoplasms, ogy utilizing NNs with many layers, enabling the
discuss strengths and weaknesses of the exist- extraction of progressively higher levels of features
ing literature, and identify potential areas for from input data. There are variations to the struc-
future research. Ultimately, this review will ture of DL NNs catered toward the complexity of
assess trends seen across the abstracted litera- the underlying data. Although there is a multitude
ture and highlight areas for methodological of subclasses of NNs, one example is known as
improvement for future studies within computa- a convolutional neural network (CNN). CNNs
tional pathology. include one or more layers of convolutional units,

Fig. 1. Relationships be-


tween AI terminologies.
Deep Learning in Endocrine Neoplasms 169

which function to receive multiple inputs from a The key purpose of ML when compared with
prior layer to create an understanding of proximity, traditional statistical approaches is the concept
which can often reduce network complexity in that a model can learn from examples, rather
cases where data proximity is important (eg, imag- than requiring explicit rules to be defined by a hu-
ing, text, speech). Beyond specific subclasses of man.8 During supervised learning, an ML algorithm
NNs, there are additional structural variations is trained to predict outcomes through exposure to
such as network depth. For example, shallow a large number of labeled data. Most often, these
NNs (contain less layers) are used to calculate an predictions are used for classification tasks in
input and output from one round of processing which the target variable is a discrete outcome
(eg, clinical scoring algorithms such as CURB- or in regression tasks where the target variable is
65).7 In contrast, deep NNs (contain more layers) a continuous outcome. Algorithms are provided
can represent more complex functions and under- input data, or features, and associated outputs,
stand underlying complexity of spatial data that re- known as labels (Fig. 2). For applications in pathol-
quires more nuance than classical clinical scoring ogy, the input data could be a digitized image of a
techniques.7 histopathology slide (the pixels of the image

Fig. 2. Digital pathology DL study workflow.


170 Ramesh et al

converted into features) labeled with the correct In this section, we will describe the current state
diagnosis. The algorithm then learns to correlate of DL applications in endocrine cancer pathology
these features to the provided labels through a (Table 1), with an emphasis on experimental
process known as training. After being exposed design, findings, and key limitations. All articles
to a sufficient number and diversity of examples included were abstracted on March 14, 2022.
through training, the model’s ability to correctly
predict labels on a set of held-out test data is THYROID NEOPLASIA
formally assessed. Ideally, the test dataset should
be abstracted from a distinct source than the Thyroid cancers represent the most common can-
training data so that the model can be evaluated cer of the endocrine system. The majority of cases
on its ability to generalize to a completely novel within thyroid cancer are of the papillary thyroid
setting, ensuring that true underlying and abstract- carcinoma (PTC) subtype, accounting for 70% to
able biological features are driving predictions, 80% of overall cases,13 although other subtypes
rather underlying noise inherent to a particular include follicular thyroid carcinoma (FTC), medul-
dataset (see Fig. 2). lary thyroid carcinoma, and anaplastic thyroid car-
Conventional ML techniques, such as logistic cinoma. There has been some notable progress in
regression, support vector machine, random forest, CPATH applications in this domain, with applica-
and gradient boosting machines, are limited in their tions aimed for tumor identification,14–16 classifi-
ability to process large, complex datasets in unpro- cation,17–20 mutation prediction,20–23 and
cessed states.9 Constructing an effective pattern- segmentation13 from both cytopathologic and his-
recognition system requires significant manual tologic samples. Below, we briefly review a sam-
engineering of input data in order to allow the algo- pling of representative studies, summarizing
rithm to effectively extract patterns found aims, results, and limitations.
throughout the dataset.9 In comparison, DL A recent study by Lin and colleagues13 used a
methods are able to better learn complex, high- CNN for diagnosis and segmentation of PTC with
level features within large datasets and are being 131 fine needle aspiration and ThinPrep PTC sam-
increasingly investigated for medical applications ples digitized as whole slide images (WSIs). All
(eg, histopathology, medical imaging, physician PTC slides were annotated with ground truth an-
electronic medical record notes).10 notations by 2 pathologists with 28 slides used
for training and 103 slides used for testing. The im-
DEEP LEARNING IN ENDOCRINE PATHOLOGY ages were initially processed to remove back-
ground noise, and a VCG16-based CNN
One of the many areas DL applications are being architecture was developed to identify and
increasingly explored is histopathology. In current segment malignant PTC. The authors demon-
medical practice, trained pathologists interpret strated strong results, showing that their proposed
histopathology slides through visual inspection, method yielded an accuracy of 99%, precision of
identifying characteristics that allow for the 86%, and recall of 98%, outperforming existing ar-
assessment of a wide range of diseases, from can- chitectures.13 The study is limited largely by the
cer to inflammatory disorders. As medical knowl- training dataset in which all 131 samples were
edge of human disease continues to expand, PTC, making it difficult to assess how such a
there has been a growing increase in cases model might perform when presented benign
requiring pathological interpretation, increasing samples or samples belonging to a different sub-
the need for high-throughput review.11 type of thyroid malignancy.
The use of computational analysis to augment Dolezal and colleagues developed a regression
tissue sample analysis is known as computational model to predict BRAF-RAS gene expression
pathology, or CPATH.11 Early attempts at CPATH score (BRS) and used the predicted scores to
utilized feature engineering, or explicitly defined identify noninvasive follicular thyroid neoplasm
characteristics provided to a computer, such as with papillary-like nuclear features (NIFTP). NIFTP
cell size or shape, in an attempt to help automate is a diagnostically challenging subtype of follicular
pathologist analysis. However, modern CPATH thyroid neoplasms known for its high interobserver
projects often utilize DL techniques that obviate variability and benign clinical course. It has been
some manual analysis compared with other tradi- associated with RAS mutational profiles, whereas
tional approaches. Ultimately, DL methodologies papillary thyroid carcinoma with extensive follic-
can serve as a tool to augment pathologist work- ular growth (PTC-EFG) is known to be associated
flows, enabling more efficiency, while democra- with BRAFV600E mutations.20 Although not all tu-
tizing pathological analysis to areas that may not mors carry these associated mutations, the au-
have dedicated pathologist otherwise.12 thors hypothesized that nonmutant NIFTPs and
Deep Learning in Endocrine Neoplasms 171

Table 1
A survey of neural networks deployed in endocrine neoplasia studies

Number
of Slides/ Validation
Manuscript Disease Area(s) Task Method Samples Type of Data Strategy
Lin et al,13 Thyroid neoplasia Diagnosis CNN 131 Cytology Internal
2021 validation
Dolezal Thyroid neoplasia Gene CNN 115 Histology External
et al,20 expression validation
2021 prediction
Böhland Thyroid neoplasia Classification CNN 289 Histology External
et al,19 (subtype) validation
2021
Kriegsmann Pancreatic Diagnosis CNN 201 Histology Internal
et al,30 neoplasia validation
2021
Naito Pancreatic Diagnosis CNN 532 Histology Internal
et al,31 neoplasia validation
2021
Redemann Neuroendocrine Site of origin CNN 215 Histology Internal
et al,35 neoplasia prediction validation
2020
Govind Neuroendocrine Grading CNN 50 Histology Internal
et al,36 neoplasia validation
2020
Dum Adrenal neoplasia Immune CNN 9405 Histology Internal
et al,41 infiltration (IHC) validation
2022 prediction

Abbreviations: CNN, convolutional neural network.

follicular-patterned PTCs might still possess differ- demonstrate PTC-like nuclei. The study used 2
ences in BRAF-RAS spectrum gene expression datasets: the Tharun and Thompson dataset
that could be leveraged to train DL models to learn (156 samples divided into FTC, FA, NIFTP,
the histologic differences between these classes. FVPTC, and PTC) and the Nikiforov dataset (133
They trained a regression model on 386 slides samples divided into benign, FTC, classic PTC,
from The Cancer Genome Atlas to predict BRS, invasive FVPTC, and encapsulated FVPTC). The
and generated BRS predictions on an external feature-based classification model included using
dataset of 115 slides of classic PTC, PTC-EFG, pretrained model focused on identifying nuclei
NIFTP, and benign follicular adenoma (FA). The within the WSIs (trained on a nonthyroid dataset
authors found that the DL BRS predictions were externally), identifying features within the
highly associated with the NIFTP subtype in the segmented nuclei, and subsequently scoring
external dataset, with RAS-like BRS predictions each slide based on these abstracted features,
identifying NIFTP neoplasms with a sensitivity of using an aggregated score threshold to determine
97.9% a specificity of 96.6%. The study is limited a classification of PTC-like versus non-PTC-like.
by the utilization of pathologist-annotated regions The DL approach also leveraged a model pre-
of interest and a lack of ground-truth gene expres- trained on ImageNet.24 Ultimately, the results
sion scores on the external dataset. demonstrate that the feature-based approach
Böhland and colleagues19 explored DL applica- achieved an accuracy of 89.7% and 83.5% on
tions in PTC by comparing performance to the Tharun and Thompson and Nikiforov datasets,
feature-based classification ML methods to respectively, whereas the DL approach yielded
differentiate samples with and without “papillary results of 83.6% and 89.1%. The study is limited
thyroid carcinoma-like” nuclei. PTC-like nuclei by a limited inclusion of borderline datasets as
may be seen in NIFTP and follicular variants of well as a high-reliance on image preprocessing,
papillary thyroid carcinoma (FVPTC); however, in addition to a limited emphasis on results
FA and FTC are neoplastic subtypes that do not explainability.
172 Ramesh et al

PANCREATIC EPITHELIAL NEOPLASIA prediction cutoff causes a substantial loss of


model robustness as more than half of the sample
Although not strictly endocrine but arising in a predictions was discarded. Finally, the most effec-
mixed exocrine/endocrine organ, pancreatic tive method to test model performance and gener-
ductal adenocarcinoma (PDAC) has been a key alizability is to utilize an external validation set that
area of focus in many research disciplines is abstracted from a source distinct from that used
because it is the fourth leading cause of cancer- in training; however, no external dataset was used
attributable deaths in the United States and pre- in this study.
dicted to increase in prevalence during the next Naito and colleagues31 built on these results in
decade.25 Although a multitude of studies have their study that used DL predictions to diagnose
evaluated AI-augmented diagnosis with radio- PDAC versus noncancerous pathology (eg, auto-
graphic image analysis (eg, CT),26 there are limited immune pancreatitis) on endoscopic ultrasound-
examples of applications that use CPATH ap- fine needle biopsy samples. The study used 532
proaches outside of the studies sampled WSIs that were split into 372 slides for training,
below.27–29 40 for validation, and 120 for testing (testing data-
One such example of DL approaches in histopa- set was selected from a subset of 182 WSIs in
thology within pancreatic neoplasms is a 2021 which all 3 pathologist reviewers agreed on a
study by Kriegsmann and colleagues,30 who definitive diagnosis). Additionally, 62 WSIs were
developed the first DL-based algorithm to identify discarded after stratification due to pathologist
key anatomic features and differentiate pancreatic evaluation revealing the diagnosis to be “indeter-
intraepithelial neoplasia (PanIN) from PDAC. The minate.” Before training, the WSIs were annotated
study included 111 patients (201 slides) who by pancreatic pathologists to highlight isolated
were randomly assigned to a training (120 pa- carcinoma components and invasive ductal carci-
tients), validation (41 patients), and test (40 pa- noma components. The model demonstrated an
tients) dataset. Each of the patient-associated AUROC of 0.984, demonstrating significant ability
slides was manually annotated by a pathologist to identify cancer versus noncancer pathologic
to show representative regions, or “patches,” condition. Although this study shows exceptional
including endocrine islands, exocrine islands, fatty performance, there are multiple limitations to the
issue, lymph node metastasis, nontumor fibrosis, study design. During group stratification, the elim-
normal ducts, tumor-associated stroma, tumor- ination of all indeterminate diagnoses led to
free pancreatic lymph node, low-grade PanIN, augmented performance that does not mimic
high-grade PanIN, and pancreatic adenocarci- real-world clinical scenarios in which many simi-
noma. For evaluation of the model, the authors larly indeterminate diagnoses are present. Addi-
used balanced accuracy (BACC; defined as the tionally, the model was tested on a selected
arithmetic mean between sensitivity and speci- dataset of cases in which a diagnosis was consis-
ficity to correct for class imbalances) and aggre- tent across 3 reviewers, creating a scenario where
gated all benign categories, both PanIN, and the model is only tested on more clinically
malignant categories into 3 individual classes. “obvious” cases, while also not utilizing an
The study evaluated multiple prediction thresholds external dataset for validation. Finally, the manual
to evaluate the model’s performance on confident annotation of regions of interest by human pathol-
predictions, while mitigating model uncertainty, ul- ogists, necessitated a significant amount manual
timately finding that a 0.90 prediction cutoff of input into this workflow, which is an important
enabled for a BACC output of 92.12% (discarding consideration when considering future real-world
53.59% of predictions).30 Although this study did application.
provide an initial foray into CPATH within the
PDAC disease area, it remains experimentally
limited. First, the requirement to annotate 11 indi- NEUROENDOCRINE NEOPLASIA
vidual classes during training, while subsequently Neuroendocrine neoplasms (NENs) are rare tu-
reaggregating the classes during evaluation raises mors that can be seen throughout the gastrointes-
limitations on results interpretability, particularly tinal (GI) tract, although with lower prevalence and
when combined with an adjusted evaluation metric incidence compared with thyroid and pancreatic
such as BACC, which further correct for class im- NENs.32 Most neuroendocrine tumors (NETs) are
balances as a single point estimate of accuracy, well differentiated and can be more localized to a
rather than more traditional uncorrected ML evalu- specific organ (eg, pancreas). To date, various bio-
ation metrics such as area under the receiver oper- markers (mainly in immunohistochemistry [IHC])
ating curve (AUROC) and precision or recall. have been associated with NENs (eg, Ki-67). To
Additionally, the use of a 90% threshold in date, there have been several studies that are
Deep Learning in Endocrine Neoplasms 173

focused on DL applications within the NEN dis- sample size and lack of true ground truth metrics
ease area,33–35 particularly over the last 3 years; for training and evaluation of model performance.
however, as seen in other disease areas, the ma- Additionally, there was no external dataset used
jority of studies focus on radiomics applications, for the evaluation of model performance making it
likely given the ubiquity of radiological datasets difficult to assess whether this approach can
relative to other data modalities such as histopa- perform effectively in novel settings.
thology.34 A subset of literature with notable find-
ings are highlighted below. PITUITARY, PARATHYROID, AND ADRENAL
Redemann and colleagues35 developed a DL NEOPLASIA
model to predict the site of origin for metastatic
well-differentiated NETs, given the task is difficult When compared with other endocrine neoplasms,
to consistently accomplish even with IHC tech- pituitary, parathyroid, and adrenal neoplasias are
niques. The study used 215 well-differentiated among the least prevalent.38,39 These neoplasms
NET hematoxylin and eosin-stained slides with a have been explored with DL applications using
known primary site. Of the overall sample, 130 radiomics and genomics data40; however, there
slides were used for training and 85 slides were are currently limited examples of studies using his-
used for testing. Compared with IHC (82% accu- topathology datasets. This may be due to the rela-
racy in site-of-origin prediction), the DL model tive scarcity of data in these domains, particularly
demonstrated an accuracy rate of 72%, ultimately given the relative rarity of malignant neoplasms of
demonstrating that the performance between IHC the pituitary, parathyroid, and adrenal glands.
gold-standard approaches and DL was not statis- Dum and colleagues41 evaluated 90 different tu-
tically significantly different. The study is limited by mor entities (including adrenocortical adenoma
an overall small sample size and the lack of a and pheochromocytoma) to assess the feasibility
discrete external validation set to evaluate model of a high-throughput analysis of lymphocyte sub-
generalizability. Furthermore, there was no populations by using an AI-supported multiple anti-
emphasis on an analysis of model explainability. body (cytotoxic T-lymphocyte associated protein 4
Nonetheless, the results do provide an initial [CTLA-4]) approach within multiple tumor subtypes.
proof-of-concept for DL approaches to be used The study used 2 different CTLA-4 antibodies due
in this disease area. to limitations with using a single antibody on
Another study by Govind and colleagues36 evalu- formalin-fixed tissue. The study incorporated 9405
ated the ability for a DL platform to improve the ac- images from the 90 tumor types used to train and
curacy of GI-NET grading. The study looked to validate a DL approach for detecting nonspecific
improve GI-NET grading by building on traditional staining. The digital images were analyzed using a
metrics such as the Ki-67 index. The authors used multistep approach, using a CNN (U-Net) for auto-
50 samples derived from various GI sites (8 stom- mated quantification of CTLA-41 cells and another
ach, 13 small bowel, 5 appendix, 3 colon, 16 rectum, deep NN (DeepLab31) for the detection of nonspe-
5 pancreas) with 2 samples discarded due to stain- cific (2) CTLA-4 staining. The results for the density
ing issues.37 The authors first developed an initial in- of CTLA-41 cells in the tumor categories identified
tegrated approach termed “Synaptophyin-Ki-67 clone-dependent unspecific staining pattern in ad-
Index Estimator” where the non-DL model was renal cortical adenoma (63%) for MSVA-152R and
trained to locate tumor cells, detect hot spots, and in pheochromocytoma (67%). The authors found
calculate the Ki-67 index. The WSIs were cropped that high CTLA-41 cell density was associated
into hot-spot-sized tiles and categorized into 4 with a low pT category, absent lymph node metas-
separate categories of background, nontumor, G1 tases, and PL-L1 expression in tumor or inflamma-
tumor, and G2 tumor, which served as ground truth tory cells. Overall, the study demonstrated the
for their subsequent DL approach. These tiles were ability for DL-assisted approaches to assist with im-
then used to train (42 cases; 15,232 image tiles) munostaining and identified potentially novel bio-
and test (6 cases; 9436 image tiles) their DL model. logical links between CTLA-4 lymphocytes and
The results demonstrated that when compared prognostic cancer features. The study was limited
with the gold-standard approaches, the study by sample sizes and potential issues with cross-
agreed with tumor grade in 45 out of 48 (93.8%) reactivity that may hinder reproducibility across all
cases and had a Ki-67 index error (difference be- tumor subtypes. Moreover, there was no external
tween GS index and estimated index) of dataset used for validation of this approach. Finally,
0.84  1.02%. The study demonstrated an inter- prior meta-analyses have indicated that there is no
esting methodology with fully automated “hot-spot significance in CTLA-4 expression and overall sur-
detection”; however, it remains constrained in its vival in multiple cancer subtypes, contradicting
evaluation. The study is limited overall by its low some of this study’s findings.42
174 Ramesh et al

DISCUSSION histologic landscape of endocrine neoplasms. It


is important to note that, although AI platforms
Major developments in DL have been enabled by have demonstrated some promise in the analysis
the explosion of data availability and computing po- of histopathological samples, they are not in the
wer, enabling automated pathology image seg- position to replace pathologist interpretation alto-
mentation analysis to ultimately augment gether. Pathologists must integrate clinical
workflows for pathologic assessment of endocrine context and understanding of disease processes
neoplasms.43 Despite promising progress, there as well as incorporate a patient’s individual cir-
are still significant shortcomings seen across med- cumstances to make an informed decision. Ulti-
ical applications of DL, including within the subfield mately, this multimodal process is not well
of DL for histopathology in endocrine neoplasms. approximated by existing AI algorithms, which
First, centralized data repositories must be an will be best used as tools deployed to work in
area of emphasis across all institutions. Although conjunction with pathologists for the foreseeable
novel ML and statistical methodologies have future.
shown some promise in mitigating limited datasets Ultimately, to transition from experimental anal-
during model training through augmentation,44–46 ysis to the clinic, there must be significant prog-
significant data diversity is necessary to enable ress from federal and institutional levels to
accurate assessments of model accuracy. Without ensure that there is clarity on model efficacy as
multi-institutional datasets, robust algorithms that well as clarity surrounding logistical consider-
are exposed to a sufficient diversity of training ations that can drive clinical implementation.
data will be difficult to generalize, particularly in Given the growing need for histopathological anal-
rare neoplastic subtypes such as pituitary tumors. ysis globally, AI tools have demonstrated promise
Furthermore, increased dataset variability will in enabling more efficient workflows to enhance
enable for more methodologically rigorous evalua- physician productivity.
tions of models on external, novel datasets to
assess generalizability. This concept has been
shown to be particularly relevant within the sub- DISCLOSURE
field of AI in histopathology, as batch effects can
significantly affect results.47 Although the internal This study was supported by the Burroughs Well-
cross-validation-based approaches seen in many come Fund Early Scientific Training to Prepare for
of the aforementioned studies can demonstrate Research Excellence Post-Graduation (BEST-
some assessment of a model’s capabilities, it is PREP). ATP reports effort support via grans from
insufficient to rely on these results, particularly in NIH/NCI U01-CA243075, grants from NIG/NIDCR
areas such a medical diagnosis, which require a R56-DE030958, and grants from SU2C (Stand
tremendous amount of precision and stability Up to Cancer) Fanconi Anemia Research Fund–
before real-world implementation. Farrah Fawcett Foundation Head and Neck Can-
Another consideration that can enhance inter- cer Research Team Grant during the conduct of
study comparisons is the need for consensus the study; ATP reports grants from Abbvie via
reporting standards and evaluation criteria. UChicago–Abbvie Joint Steering Committee
Currently, efforts such as the Transparent Report- Grant, and grants from Kura Oncology. ATP re-
ing of a multivariable prediction model of Individual ports personal feeds from Prelude Therapeutics
Prognosis or Diagnosis (TRIPOD) and additional Advisory Board, personal fees from Elevar Advi-
evolving guidelines such as TRIPOD-AI48 are sory Board, and personal fees from Ayala Advisory
focused on developing guidelines and standards Board outside of submitted work.
for reporting prediction models. Currently, studies
demonstrate significant variability in how results
are reported, making it exceptionally difficult for ACKNOWLEDGMENTS
medical practitioners to assess relative perfor- The authors appreciate the opportunity and
mance across studies. An increased focus on guidance from Dr. Nicole Cipriani in the composi-
rigorous and standardized metrics will not only tion of this article.
enable cross-study comparisons but also serve
to increase adoption of DL approaches across REFERENCES
medical subspecialities.
DL applications in histopathology have demon- 1. Tran KA, Kondrashova O, Bradley A, et al. Deep
strated the potential to reduce pathologist work- learning in cancer diagnosis, prognosis and treat-
loads and increase our understanding of the ment selection. Genome Med 2021;13:152.
Deep Learning in Endocrine Neoplasms 175

2. Gulshan V, Peng L, Coram M, et al. Development thyroid tumor by histopathology: a large-scale pilot
and validation of a deep learning algorithm for study. Ann Transl Med 2019;7(18):468.
detection of diabetic retinopathy in retinal fundus 18. El-Hossiny AS, Al-Atabany W, Hassan O, et al. Clas-
photographs. JAMA 2016;316(22):2402–10. sification of Thyroid Carcinoma in Whole Slide Im-
3. Giudicessi JR, Schram M, Bos JM, et al. Artificial ages Using Cascaded CNN. IEEE Access 2021;9:
intelligence–enabled assessment of the heart 88429–38.
rate corrected qt interval using a mobile electro- 19. Böhland M, Tharun L, Scherr T, et al. Machine
cardiogram device. Circulation 2021;143(13): learning methods for automated classification of tu-
1274–86. mors with papillary thyroid carcinoma-like nuclei: A
4. Ratner M. FDA backs clinician-free AI imaging quantitative analysis. PLoS One 2021;16(9):
diagnostic tools. Nat Biotechnol 2018;36(8): e0257635.
673–4. 20. Dolezal JM, Trzcinska A, Liao CY, et al. Deep
5. Luchini C, Pea A, Scarpa A. Artificial intelligence in learning prediction of BRAF-RAS gene expression
oncology: current applications and future perspec- signature identifies noninvasive follicular thyroid
tives. Br J Cancer 2022;126(1):4–9. neoplasms with papillary-like nuclear features. Mod
6. Kleppe A, Skrede OJ, De Raedt S, et al. Designing Pathol 2021;34(5):862–74.
deep learning studies in cancer diagnostics. Nat 21. Anand D, Yashashwi K, Kumar N, et al. Weakly su-
Rev Cancer 2021;21(3):199–211. pervised learning on unannotated H&E-stained
7. Chary MA, Manini AF, Boyer EW, et al. The Role and slides predicts BRAF mutation in thyroid cancer
Promise of Artificial Intelligence in Medical Toxi- with high accuracy. J Pathol 2021;255(3):232–42.
cology. J Med Toxicol 2020;16(4):458–64. 22. Tsou P, Wu CJ. Mapping driver mutations to histo-
8. Rajkomar A, Dean J, Kohane I. Machine Learning in pathological subtypes in papillary thyroid carci-
Medicine. N Engl J Med 2019;380(14):1347–58. noma: applying a deep convolutional neural
9. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature network. J Clin Med 2019;8(10):1675.
2015;521(7553):436–44. 23. Fu Y, Jung AW, Torne RV, et al. Pan-cancer computa-
10. Esteva A, Robicquet A, Ramsundar B, et al. A guide tional histopathology reveals mutations, tumor compo-
to deep learning in healthcare. Nat Med 2019;25(1): sition and prognosis. Nat Cancer 2020;1(8):800–10.
24–9. 24. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. Im-
11. van der Laak J, Litjens G, Ciompi F. Deep learning in ageNet: A large-scale hierarchical image database.
histopathology: the path to the clinic. Nat Med 2021; In: 2009 IEEE Conference on Computer Vision and
27(5):775–84. Pattern Recognition. ; 2009:248-255. doi:10.1109/
12. Ahmad Z, Rahim S, Zubair M, et al. Artificial intelli- CVPR.2009.5206848.
gence (AI) in medicine, current applications and 25. Cancer of the pancreas - cancer stat facts. SEER.
future role with special emphasis on its potential Available at: https://seer.cancer.gov/statfacts/html/
and promise in pathology: present and future pancreas.html. Accessed April 5, 2022.
impact, obstacles including costs and acceptance 26. Kenner B, Chari ST, Kelsen D, et al. Artificial intelli-
among pathologists, practical and philosophical gence and early detection of pancreatic cancer.
considerations. A comprehensive review. Diagn Pancreas 2021;50(3):251–79.
Pathol 2021;16(1):24. 27. Fu H, Mi W, Pan B, et al. Automatic pancreatic ductal
13. Lin YJ, Chao TK, Khalil MA, et al. Deep learning fast adenocarcinoma detection in whole slide images
screening approach on cytological whole slides for using deep convolutional neural networks. Front On-
thyroid cancer diagnosis. Cancers (Basel) 2021; col 2021;11:665929.
13(15):3891. 28. Wu W, Liu X, Hamilton RB, et al. Graph Convolutional
14. Dov D, Kovalsky SZ, Assaad S, et al. Weakly super- Neural Networks for Histological Classification of
vised instance learning for thyroid malignancy pre- Pancreatic Cancer 2022;28. https://doi.org/10.
diction from whole slide cytopathology images. 1101/2022.01.26.22269832, 2022.01.26.22269832.
Med Image Anal 2021;67:101814. 29. Chang YH, Thibault G, Madin O, et al. Deep learning
15. Elliott Range DD, Dov D, Kovalsky SZ, et al. Applica- based Nucleus Classification in pancreas histologi-
tion of a machine learning algorithm to predict ma- cal images. In: 2017 39th Annual International Con-
lignancy in thyroid cytopathology. Cancer ference of the IEEE Engineering in Medicine and
Cytopathology 2020;128(4):287–95. Biology Society (EMBC). ; 2017:672-675.
16. Sanyal P, Mukherjee T, Barui S, et al. Artificial intelli- doi:10.1109/EMBC.2017.8036914.
gence in cytopathology: A neural network to identify 30. Kriegsmann M, Kriegsmann K, Steinbuss G, et al.
papillary carcinoma on thyroid fine-needle aspira- Deep learning in pancreatic tissue: identification of
tion cytology smears. J Pathol Inform 2018;9(1):43. anatomical structures, pancreatic intraepithelial
17. Wang Y, Guan Q, Lao I, et al. Using deep convolu- neoplasia, and ductal adenocarcinoma. Int J Mol
tional neural networks for multi-classification of Sci 2021;22(10):5385.
176 Ramesh et al

31. Naito Y, Tsuneki M, Fukushima N, et al. A deep 40. Thomasian NM, Kamel IR, Bai HX. Machine intelli-
learning model to detect pancreatic ductal gence in non-invasive endocrine cancer diagnos-
adenocarcinoma on endoscopic ultrasound- tics. Nat Rev Endocrinol 2022;18(2):81–95.
guided fine-needle biopsy. Sci Rep 2021;11(1): 41. Dum D, Henke TLC, Mandelkow T, et al. Semi-auto-
8454. mated validation and quantification of CTLA-4 in 90
32. Dasari A, Shen C, Halperin D, et al. Trends in the different tumor entities using multiple antibodies and
incidence, prevalence, and survival outcomes in pa- artificial intelligence. Lab Invest 2022;1–8. https://
tients with neuroendocrine tumors in the United doi.org/10.1038/s41374-022-00728-4.
States. JAMA Oncol 2017;3(10):1335–42. 42. Hu P, Liu Q, Deng G, et al. The prognostic value of
33. Wallace PW, Conrad C, Brückmann S, et al. Metab- cytotoxic T-lymphocyte antigen 4 in cancers: a sys-
olomics, machine learning and immunohistochem- tematic review and meta-analysis. Sci Rep 2017;
istry to predict succinate dehydrogenase 7(1):42913.
mutational status in phaeochromocytomas and par- 43. Kochanny S, Pearson A. Academics as leaders in
agangliomas. J Pathol 2020;251(4):378–87. the cancer artificial intelligence revolution. Cancer
34. Pantelis AG, Panagopoulou PA, Lapatsanis DP. Arti- 2020;127. https://doi.org/10.1002/cncr.33284.
ficial intelligence and machine learning in the diag- 44. Sandfort V, Yan K, Pickhardt PJ, et al. Data augmen-
nosis and management of gastroenteropancreatic tation using generative adversarial networks (Cycle-
neuroendocrine neoplasms—a scoping review. Di- GAN) to improve generalizability in CT segmentation
agnostics 2022;12(4):874. tasks. Sci Rep 2019;9(1):16884.
35. Redemann J, Schultz FA, Martinez C, et al. 45. Miko1ajczyk A, Grochowski M. Data augmentation for
Comparing deep learning and immunohistochemistry improving deep learning in image classification prob-
in determining the site of origin for well-differentiated lem. In: 2018 International Interdisciplinary PhD
neuroendocrine tumors. J Pathol Inform 2020;11:32. Workshop (IIPhDW). ; 2018:117-122. doi:10.1109/
36. Govind D, Jen KY, Matsukuma K, et al. Improving the IIPHDW.2018.8388338.
accuracy of gastrointestinal neuroendocrine tumor 46. Wei J, Suriawinata A, Vaickus L, et al. Generative Im-
grading with deep learning. Sci Rep 2020;10(1):11064. age Translation for Data Augmentation in Colorectal
37. Matsukuma K, Olson KA, Gui D, et al. Synaptophy- Histopathology Images. Proc Mach Learn Res 2019;
sin-Ki67 double stain: a novel technique that im- 116:10–24.
proves interobserver agreement in the grading of 47. Howard FM, Dolezal J, Kochanny S, et al. The
well-differentiated gastrointestinal neuroendocrine impact of site-specific digital histology signatures
tumors. Mod Pathol 2017;30(4):620–9. on deep learning model accuracy and bias. Nat
38. Chen C, Hu Y, Lyu L, et al. Incidence, demo- Commun 2021;12(1):4423.
graphics, and survival of patients with primary pitu- 48. Collins GS, Dhiman P, Navarro CLA, et al. Protocol for
itary tumors: a SEER database study in 2004–2016. development of a reporting guideline (TRIPOD-AI)
Sci Rep 2021;11(1):15155. and risk of bias tool (PROBAST-AI) for diagnostic
39. Correa P, Chen VW. Endocrine gland cancer. Cancer and prognostic prediction model studies based on
1995;75(1 Suppl):338–52. artificial intelligence. BMJ Open 2021;11(7):e048008.

You might also like