In-Press Article: Deep Learning and Imaging For The Orthopaedic Surgeon

1
C OPYRIGHT 2022 BY T HE J OURNAL OF B ONE AND J OINT S URGERY, I NCORPORATED
Current Concepts Review

Deep Learning and Imaging for the
Orthopaedic Surgeon
How Machines “Read” Radiographs
LE
Brandon G. Hill, MSc, Justin D. Krogue, MD, David S. Jevsevar, MD, MBA, and Peter L. Schilling, MD, MSc
Investigation performed at Dartmouth Hitchcock Medical Center, Lebanon, New Hampshire
C
ä In the not-so-distant future, orthopaedic surgeons will be exposed to machines that begin to automatically “read”
TI
medical imaging studies using a technology called deep learning.
ä Deep learning has demonstrated remarkable progress in the analysis of medical imaging across a range of
modalities that are commonly used in orthopaedics, including radiographs, computed tomographic scans, and
AR
magnetic resonance imaging scans.
ä There is a growing body of evidence showing clinical utility for deep learning in musculoskeletal radiography, as
evidenced by studies that use deep learning to achieve an expert or near-expert level of performance for the
identification and localization of fractures on radiographs.
ä Deep learning is currently in the very early stages of entering the clinical setting, involving validation and proof-of-
S
concept studies for automated medical image interpretation.
The success of deep learning in the analysis of medical imaging has been propelling the field forward so
ES
ä
rapidly that now is the time for surgeons to pause and understand how this technology works at a conceptual
level, before (not after) the technology ends up in front of us and our patients. That is the purpose of this
article.
R
In 2012, a research group led by Geoffrey Hinton achieved a different. It does not require step-by-step guidance from a
dramatic milestone in the ability of a computer to automati- human; rather, humans need only supply data and a learning
-P
cally identify objects in images1. Their work was a testament to system, and then the computer learns patterns on its own5.
the power and capability of machine learning—the idea of Hinton’s group specifically demonstrated the capacity of deep
using algorithms to identify patterns in data, to build mathe- learning, a branch of machine learning in which mathematical
matical models based on those patterns, and to use those models learn to make predictions directly from unprocessed
IN
models to make a determination or prediction about some- data (such as images)3,6. Excitement over their advancement
thing in the world2-4. This is a departure from the usual way of ignited a research renaissance in deep learning and the broader
doing things. Traditionally, the way to enable a computer to do field of machine learning. Since then, the results have entered
anything was to enter explicit instructions, line-by-line in the our daily lives in numerous forms, from digital assistants we
form of human-authored computer code. Machine learning is talk to, to self-driving cars, to automated drug discovery7-17.
Disclosure: The Disclosure of Potential Conflicts of Interest forms are provided with the online version of the article (http://links.lww.com/JBJS/H59).
Copyright 2022 by The Journal of Bone and Joint Surgery, Incorporated This is an open-access article distributed under the terms of the Creative
Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where it is permissible to download and share the work provided it is
properly cited. The work cannot be changed in any way or used commercially without permission from the journal.
J Bone Joint Surg Am. 2022;00:1-12 d http://dx.doi.org/10.2106/JBJS.21.01387

2
TH E JO U R NA L O F B O N E & JO I N T SU RG E RY J B J S . O RG
d
DEEP LEARNING AND IMAGI N G FOR THE O R T H O PA E D I C S U R G E O N
V O L U M E 00-A N U M B E R 00 M AY 20, 2 022
d d
TABLE I A Nonexhaustive Review of Use Cases and Representative Studies Demonstrating the Current State of Deep Learning Across
Musculoskeletal Radiography
Level of
Use Case Representative Study Evidence Description of Study*
Diagnosis (including classification, staging, and severity of disease)

Fracture identification, See Table II See Table See Table II
classification, and localization II
33
Implant loosening Shah et al. (2020) II Incremental inputs improve the automated detection of implant loosening using
LE
machine-learning algorithms:
The study used a CNN to predict implant loosening from radiographs in first-time
revision total hip and knee arthroplasty patients with 88% accuracy by using a
combination of radiographs in conjunction with patient characteristics.
34
Bone tumor classification von Schacky et al. II Multitask deep learning for segmentation and classification of primary bone
(2021) tumors on radiographs:
C
This work trained a CNN to simultaneously localize and classify primary bone tumors
on radiographs. Trained on 934 radiographs, benign or malignant bone tumors were
diagnosed using the patient histopathologic findings as the reference standard. For
the classification of bone tumors as malignant or benign, the model achieved 80%
TI
accuracy, 63% sensitivity, and 88% specificity. This classification accuracy was
higher than that of 2 radiographic residents (71% and 65%; p = 0.002 and p <
0.001, respectively). It was comparable with that of 2 musculoskeletal fellowship-
AR trained radiologists (84% and 83%; p = 0.13 and p = 0.25, respectively).
21
Knee osteoarthritis severity Norman et al. (2019) II Applying densely connected convolutional neural networks for staging
osteoarthritis (OA) severity from radiographs:
The study used a CNN to automatically stage osteoarthritis severity according to
Kellgren-Lawrence grade from radiographs of the knee in an adult population. The
model was trained on 4,490 bilateral PA fixed-flexion knee radiographs from adults
in the U.S. The model achieved sensitivity rates for no, mild, moderate, and severe
OA of 84%, 70%, 69%, and 86%, respectively. The corresponding specificity rates
were 86%, 84%, 97%, and 99%.
S
35
Osteoporosis Yamamoto et al. (2020) II Deep learning for osteoporosis classification using hip radiographs and patient
clinical covariates:
The authors used deep learning to diagnose osteoporosis from radiographs of the
ES
hip (T-score < 22.5) using a data set of 1,131 images from patients who underwent
both skeletal bone mineral density measurement and hip radiography at a single
hospital in Japan from 2014 to 2019. The CNN applied to hip radiographs alone
exhibited high accuracy (approx. 84%), sensitivity (approx. 81%), and specificity
(approx. 88%), and the performance improved further with the addition of clinical
covariates from patient records (approx. 89%, 89%, and 88%, respectively).
R
27
Pediatric bone age Halabi et al. (2019) II The RSNA pediatric bone age machine learning challenge:
The challenge asked competitors to create a model using machine learning that
could accurately determine skeletal age in a data set of pediatric hand radiographs
-P
(n = 14,236). The best models were all CNNs, and the top performing models were,
on average, accurate to within 4.2 to 4.5 months of the reference standard, bone
age.
26
Cause of shoulder pain Grauhan et al. (2022) II Deep learning for accurately recognizing common causes of shoulder pain:
The authors trained a CNN to automatically detect the most common causes of
IN
shoulder pain on radiographs, including proximal humeral fractures, joint

dislocations, periarticular calcification, osteoarthritis, osteosynthesis, and a joint
endoprosthesis. The model was trained on 2,700 shoulder radiographs from
multiple institutions. Model performance was variable across the 6 diagnoses:
sensitivity and specificity were 75% and 86% for fractures, 95% and 65% for joint
dislocation, and 90% and 86% for osteoarthrosis.
Implant identification
29
Arthroplasty implant Karnuta et al. (2021) II Artificial intelligence to identify arthroplasty implants from radiographs of the hip:
identification The authors trained, validated, and tested a CNN to classify total hip arthroplasty
femoral implants as 1 of 18 different models from 1,972 retrospectively collected
radiographs of the hip. The CNN discriminated 18 implant models with AUC of 0.99,
accuracy of 99%, sensitivity of 94%, and specificity of 99% in the external-testing
data set.
continued
3
d
d d
TABLE I (continued)
Level of
Use Case Representative Study Evidence Description of Study*
Prognosis and risk stratification

32
Risk of dislocation in total hip Rouzrokh et al. (2021) II Deep learning artificial intelligence model for assessment of hip dislocation risk
arthroplasty following primary hip arthroplasty from postoperative radiographs:
The study presented a CNN trained to predict the risk of future hip dislocation from
postoperative radiographs and succeeded in achieving high sensitivity (89%) and
high negative predictive value (99%) in the external-testing data set. The study also
LE
demonstrated the use of saliency maps to highlight the radiographic features that
the model found to be most predictive of prosthesis dislocation.
28
Bone mineral density Hsieh et al. (2021) II Automated bone mineral density prediction and fracture risk assessment using
radiographs via deep learning:
This study examined how successfully a CNN can measure bone mineral density and
C
evaluate fracture risk through a radiograph. Results were compared against dual x-
ray absorptiometry (DXA) measurements and fracture risk assessment tool (FRAX)
scores. Both hips (5,633 training radiographs) and the lumbar spine (7,307 training
radiographs) were examined. The AUC and accuracy were 0.89 and 92% for de-
TI
tecting hip osteoporosis and 0.96 and 90% for high hip fracture risk. When applied
to the lumbar spine radiographs, the scores were 0.89 and 86% for spine osteo-
porosis, and 0.83 and 95% for high 10-year fracture risk. This capability would allow
for evaluating fracture risk using radiographs already made for other indications to
Measurements
Leg lengths Zheng et al.
25
(2020) II
AR identify at-risk patients, without additional cost, time, and radiation from DXA.
Deep learning measurement of leg-length discrepancy in children based on radi-

ographs:
This study created a method for automating the measurement of leg length dis-
crepancies in pediatric patients using deep learning. The authors trained and
tested a CNN using 179 radiographs and found that deep learning-derived mea-
S
surements were strongly correlated with those that were manually derived by
pediatric radiologists (r = 0.92 and mean absolute error of 0.51 cm for full leg length
discrepancy, p < 0.001).
18
ES
Cobb angle Horng et al. (2019) II Cobb angle measurement of the spine from radiographs using convolutional neural
networks:
The authors created an automatic system for measuring spine curvature via Cobb
angle by applying a CNN to 595 AP spinal radiographs. The deep learning-derived
measurements did not demonstrate any significant differences compared with the
manual measurements made by clinical doctors (p < 0.98).
R
Scientific discovery
22
Insights into pain disparities Pierson et al. (2021) II An algorithmic approach to reducing unexplained variation in pain disparities in
in underserved populations underserved populations:
Underserved populations experience higher levels of pain. Using osteoarthritis as
-P
an example, this study looked at whether this pain can be detected in radiographs.
Traditional methods to objectively measure osteoarthritic severity (i.e., Kellgren-
Lawrence grade) account for only 9% of this disparity. By training a CNN to predict
pain levels from 25,049 radiographs, algorithmic predictions accounted for 43% of
disparities. This implies that much of the osteoarthritic pain felt by underserved
IN
patients stems from visible factors within the knee that are not captured in standard
radiographic measures of severity such as Kellgren-Lawrence grade.
*PA = posteroanterior, RSNA = Radiological Society of North America, AUC = area under the receiver operating characteristic curve, and AP = anteroposterior.
The impact of deep learning is expected to be just as showing clinical utility for deep learning in musculoskeletal
profound in clinical medicine as it already has been in daily life. radiography (Table I), as evidenced by studies that use deep
This is particularly true for medical imaging. Deep learning has learning to achieve an expert or near-expert level of perfor-
demonstrated remarkable progress in the analysis of medical mance for the identification and localization of fractures on
imaging across a range of modalities including radiographs, radiographs (Table II)31,36-43. Until recently, these deep learning
computed tomographic (CT) scans, and magnetic resonance algorithms had been confined to research papers, narrow tasks,
imaging (MRI) scans18-35. There is a growing body of evidence and specific regions of human anatomy, but the technology is
4
d
d d
TABLE II Select Studies That Demonstrate the Potential of Deep Learning for the Automated Detection of Bone Fractures in Radiographs
Across Different Regions of Human Anatomy*
Study by Training Performance of Model†

Anatomical Level of Objectives and Study Cohort Data Set
Region Evidence Prediction Classes and Image Sources Reference Standard* Size† AUC Sensitivity Specificity Accuracy
Hip
31
Krogue et al. II Objectives: Retrospective Reviewed by 2 orthopaedic 1,815 0.98 93% 94% 94%
(2020) - Fracture detection cohort: residents. In cases of
- Fracture classification All hip and pelvic uncertainty, CT and MRI
LE
- Fracture localization radiographic scans and postop. imaging
Prediction classes: studies obtained in were reviewed
- Fracture or no fracture the ED of a single
- Multiclass: No fracture, institution with the
displaced femoral neck words
C
fracture, nondisplaced, “intertrochanteric”
femoral neck fracture, or “femoral neck”
intertrochanteric occurring near
“fracture” in the
TI
fracture, and preexisting
implant radiology report
- Localization of fracture at a single
by bounding box institution from
1998 to 2017 AR
37
Cheng et al. II Objective: Retrospective Diagnosis derived from 3,605 0.98 98% 84% 91%
(2019) - Fracture detection cohort: trauma registry reviewed
Prediction classes: Pelvic radiographs by a trauma surgeon. Other
- Fracture or no fracture from trauma imaging modalities and
patients seen at a clinical course were
single institution reviewed in equivocal cases
from 2008 to 2017
S
Urakawa II Objective: Retrospective Radiology reports were 2,678 0.98 94% 97% 96%
42
et al. (2019) - Fracture detection cohort: reviewed by a single board-
(intertrochanteric) All hip radiographs certified orthop. surgeon
ES
Prediction classes: from patients with

- Fracture or no fracture intertrochanteric
hip fractures
treated with
compression hip
screws from 2006
R
to 2017
Shoulder
38
Chung et al. II Objectives: Retrospective 2 experienced shoulder 1,891 1.00 99% 97% 96%
-P
(2018) - Fracture detection cohort: orthop. specialists and

- Fracture classification All shoulder AP 1 musculoskeletal
Prediction classes radiographs from 7 radiologist. When no
- Neer classification 1 no hospitals agreement from
IN
fracture independent reports, CT

scans and other imaging
were reviewed
Wrist
43
Thian et al. II Objectives: Retrospective 3 experienced radiologists. 14,614 0.90 98% 73% NS
(2019) - Fracture detection cohort: Questionable images were
- Fracture localization AP and lateral wrist labeled by consensus
Prediction classes: radiographs in a
- Fracture or no fracture single institution
- Estimates of fracture between 2015 and
localization 2017
continued
5
d
d d
TABLE II (continued)
Study by Training Performance of Model†

Anatomical Level of Objectives and Study Cohort Data Set
Region Evidence Prediction Classes and Image Sources Reference Standard* Size† AUC Sensitivity Specificity Accuracy
Kim and II Objective: Retrospective Report reviewed by a 1,111 0.95 90% 88% NS
40
MacKinnon - Fracture detection cohort: radiology registrar
(2018) Prediction classes: Lateral wrist competent in the reporting
- Fracture or no fracture radiographs at a of radiographs and with 3 yr
single institution of radiology experience
LE
between 2015 and
2016
Spine
36
Chen et al. II Objective: Retrospective Initially labeled based on 1,045 0.72 74% 73% 74%
(2021) - Fracture detection cohort: diagnosis in registry. Final
C
Prediction classes: Plain abdominal diagnosis determined via
- Fracture or no fracture frontal radiographs the agreement of a
obtained from a radiologist and spine
TI
single institution surgeon using any
from 2015 to 2018 supportive images available
Ankle
Kitamura
41
et al. (2019)
II Objective:
- Fracture detection
Prediction classes:
- Fracture or no fracture
Retrospective
cohort:
AR
Studies of ankle
fractures were
Reviewed by a board-
certified radiologist and a
fourth-year radiology
resident
1,441 NS 80% 83% 81%
identified by
parsing radiology
reports
All
S
39
Jones et al. II Objectives: Retrospective Data set manually 715,343 0.97 95% 81% NS
(2020) - Fracture detection cohort: annotated by 18 orthop.
- Fracture localization Radiographs of 16 surgeons and 11
ES
Prediction classes: anatomical regions radiologists

- Fracture or no fracture from 15 hospitals
- Estimates of fracture and outpatient care
localization centers in the U.S.
*AUC = area under the receiver operating characteristic curve, ED = emergency department, CT = computed tomographic, MRI = magnetic
R
resonance imaging, AP = anteroposterior, and NS = not specified †The reference standard is the source of “ground truth,” in this case, the source of
truth for whether a radiograph demonstrates (or does not demonstrate) a fracture. The size of the training data set is the number of example
radiographs used to train the model. The performance measures used for deep learning models are conceptually the same as those used to evaluate
-P
diagnostic screening tests used in medicine and include measures such as the AUC, sensitivity, specificity, and accuracy. In general, these are
measures that compare a model’s predictions (e.g., fracture/no fracture) to ground truth using a testing data set.
advancing rapidly. Deep learning is now in the early stages of How a CNN Model Works
IN
entering the clinical setting, involving validation and proof-of- Overview: Layers of Mathematical Functions
concept studies44,45. Only time will tell, but we believe that one Deep learning is not magic; it is mathematics. Computers
thing is certain: The success of deep learning in the analysis of “think” in numbers, so for a computer to “see” a radiograph,
medical imaging has been propelling the field forward so the information in the image must be put into a form that
rapidly that now is the time for surgeons to pause and computers can process, perform calculations on, and analyze.
understand how this technology works at a conceptual level, Thus, computers “see” a digitized image as numbers, and make
before the technology ends up in front of us and our patients. sense of these images-made-numeric using mathematical
This article is intended to provide surgeons with this basic level functions. Stated simply, CNNs are mathematical functions
of understanding of how current deep learning methods work. that are uniquely suited for the analysis of images. As in all
We do this with a foundational example—explaining how a mathematical functions, CNNs have inputs (in this case, the
deep learning method called convolutional neural networks numeric representation of the image) and a final output (a
(CNNs) enables a computer to “read” radiographs to detect the numeric prediction of what it “sees” in the image)3. CNNs are
presence of a fracture. very complex, but one does not need to understand the
6
d
d d
LE
C
TI
Fig. 1
A biologic neuron versus an artificial neuron. The original inspiration for the mathematical structure of all deep-learning systems (including CNNs) came from
the network of neurons in the visual cortex of mammals. In place of a vast network of interconnected biologic neurons, deep learning systems substitute a
vast network of interconnected mathematical functions. While biologic neurons exchange electrical signals, the artificial neurons of these networks
AR
exchange the results of their functions. That is, an artificial neural network (ANN) can be thought of as a complex mathematical function made up of a large
number of more basic mathematical functions that “connect” and “send signals” between each other in ways that are reminiscent of the synapses and
action potentials of biologic neurons. ANNs are typically organized in layers of interconnected neurons. The mathematical function in each neuron takes
weighted inputs (that is, excitatory or inhibitory signals at a biologic neuron’s dendrites) and sums them to produce an output or “activation” (something akin
to a biologic neuron’s action potential). The output is then passed on to the neuron(s) in the next layer of the network, where it is received as a weighted input.
Thus, the “deep” in deep learning refers to the mathematical depth—the number of layers of mathematical functions that make up the more complex
S
mathematical function that is the neural network in its totality. It is this layering of mathematical functions that enables ANNs (like CNNs) to capture
complex, nonlinear relationships, as is required in the identification of a fracture within a radiograph.
ES
mathematical details to get a sense of how these functions Different layers in an ANN perform distinct roles. These
work. layers are categorized as input, hidden, and output layers3. Every
The original inspiration for the mathematical structure ANN starts with an input layer and ends with an output layer.
of all deep learning systems (including CNNs) came from the Not surprisingly, the input layer is responsible for receiving the
network of neurons in the brain (Fig. 1)46-48. In place of a inputs (in our example, a radiograph). The output layer is
R
network of interconnected biologic neurons, deep learning responsible for providing us an answer (e.g., how confident the
systems substitute a network of interconnected mathematical ANN is that a radiograph demonstrates a fracture). An ANN
functions referred to as an artificial neural network (ANN). also typically has a variable number of hidden layers—hidden
-P
While biologic neurons exchange electrical signals, the arti- because they are stacks of mathematical functions that reside in
ficial neurons in an ANN exchange the results of their the middle, between input and output layers, shielded from
mathematical functions. One should think of the ANN in its view (Fig. 2).
entirety as a complex mathematical function made up of a The hidden layers are where most of the “magic” happens.
IN
large number of more basic mathematical functions that That is because the addition of hidden layers of mathematical
“connect” and “send signals” between each other in ways that functions adds depth. The “deep” in deep learning refers to the
are reminiscent of the synapses and action potentials of bio- mathematical depth—the number of layers of mathematical
logic neurons. functions that make up the more complex mathematical func-
ANNs are organized in layers of interconnected neurons. tion that is the neural network in its totality. It is this layering of
A layer is a general term for a group of neurons that work mathematical functions that enables ANNs to capture complex,
together at a specific depth within the ANN. The mathematical nonlinear relationships. There may be dozens of these so-called
function in each neuron takes weighted inputs (think excita- hidden layers within a deep ANN, but not all layers are the same3.
tory or inhibitory signals at a biologic neuron’s dendrites) and Different types of hidden layers use different mathematical
sums them to produce an output or “activation” (something functions, and some layers are better suited for some tasks than
akin to a biologic neuron’s action potential). The output is then others. The types of layers commonly used for “reading” a
passed on to the neuron(s) in the next layer of the network, radiograph include convolutional layers, pooling layers, and fully
where it is received as a weighted input. connected layers, all of which will be discussed3.
7
d
d d
LE
C
TI
Fig. 2
AR
ANNs are a network of artificial neurons, depicted as circles. ANNs are often arranged in layers. It starts with the input layer: The input layer is a column of
hundreds of thousands of neurons (mathematical functions) that are each fed a small part of the raw input (for example, a radiograph). The output of each
neuron is then sent to hundreds or thousands of neurons in the first hidden layer. Each mathematical function in this hidden layer takes the hundreds or
thousands of inputs it receives and sends its own mathematical output to the next hidden layer. This process repeats itself throughout each layer of the ANN:
each layer takes its input, performs calculations via its neurons, and then transmits its output on to the subsequent layer. The final output layer typically only
has a small number of neurons. In the case of 1 neuron, the final output is a number showing how confident the ANN is that something is true (e.g., the input
S
radiograph shows a fracture). In the case of several output neurons (as depicted here), each output number would show how confident the ANN is about
several predictions (e.g., the input radiograph shows a femoral neck fracture versus intertrochanteric fracture versus subtrochanteric hip fracture, etc.).
ES
The final output layer is responsible for providing an the input radiograph shows a fracture). In the case with several
answer (e.g., does the radiograph demonstrate a fracture?). The output neurons, each output number would show how confi-
output layer typically has only a small number of neurons. In dent the ANN is about several predictions (e.g., the input
the case with 1 output neuron, the final output is a number radiograph shows a femoral neck versus an intertrochanteric
showing how confident the ANN is that something is true (e.g., hip fracture, etc.).
R
-P
IN
Fig. 3
As in the visual cortex of mammals, the earliest neurons in a CNN detect relatively simple features while the downstream neurons use this output to detect
more complex shapes referred to as high-level features. Low-level features are shapes like lines and edges. Mid-level features might be something akin to a
cortical border, the edge of the joint line, etc. High-level features might look to a human like a femoral diaphysis or a femoral head in its entirety. It is critical to
understand that humans do not tell the computer what visual features to look for—a jagged edge, a disruption of the cortex—rather, the computer selects
the features on its own, choosing those that are most predictive for the task at hand—in this case, the identification of a fracture.
8
d
d d
functions to detect elementary shapes such as lines or edges (so-

called low-level features). To do this, a neuron uses a “drawing”
of the shape being detected. This “drawing” is called a filter51. The
filter’s “drawing” is itself a matrix that looks like a small table of
numbers, with each cell of the table characterizing the pixel
brightness of the shape. Figure 5 shows a 3 · 3 filter (3 pixels in
width times 3 pixels in height) for detecting a right-sided edge.
The CNN starts by aligning this filter with the first 3 · 3 pixels in
the image and measures how closely those 9 pixels in the image
match with the filter. The filter is then moved over 1 pixel to the
LE
right, and the degree of similarity is again measured. This pro-
cess is repeated in a scanning fashion across the entire image, left
to right, top to bottom. Mathematically, this is convolving the
image with the filter, and it is the reason that a CNN is called a
convolutional neural network51. The output of this neuron is
C
Fig. 4 called a feature map—a map of how strongly right-sided edges
Digital images are composed of pixels, and computers represent pixels were detected at each point in the original image, with white
numerically, by assigning a number to each pixel that encodes brightness.
TI
pixels where the pattern exists and black ones where it does not.
For illustrative purposes, a small square of the image along the edge of the Each neuron in the first layer of the CNN has a unique filter and
iliac crest has been enlarged to show the brightness of 9 pixels and their creates its own feature map for the shape it was designed to
numeric representation in a matrix.
AR detect (horizontal, diagonal, or curved lines, etc.). Together, all
feature maps are passed to the next layer of the network3,49,51.
CNNs are a form of ANN that takes this biologic inspi- Individually, feature maps of lines and edges do not
ration further, with chains of interconnected artificial neurons capture very complex shapes and patterns. So, the neurons in
responsible for detecting different shapes or visual features the next layer detect more complex shapes by looking at several
within an image47. As in our visual cortex, the earliest neurons feature maps from the prior layer. For example, consider a
in this chain detect relatively simple visual features, while the slightly more complex visual feature, such as a corner. For a
downstream neurons use this output to detect more complex neural filter in the next layer of the CNN to detect a corner, it
S
shapes referred to as high-level features. Low-level features are must sense the end of a vertical edge meeting the end of a
shapes like lines and edges. Mid-level features might be some- horizontal edge. Only in places where both the vertical and
ES
thing akin to a cortical border or the edge of the joint line in a horizontal edge maps from the prior layer record the end of an
radiograph. High-level features might look to the human eye edge will the corner filter signal the presence of a corner at that
like a femoral diaphysis or a femoral head in its entirety (Fig. 3). location. In reality, each filter can use all of the feature maps
In a CNN, like the visual cortex, it is the stacking of layers of from a prior layer of neurons to measure the presence of the
neurons (i.e., mathematical functions) that enables vision. The compound shape it was designed to detect3.
layers of mathematical functions define details of images, little As described, the system cannot detect any features
R
bits at a time, starting simply and then building to eventually bigger than 3 · 3 pixels. While larger filters may be used (e.g., 9
identify entire objects46. · 9 pixels), they can create overly specific filters. To avoid this
problem, special layers called pooling layers are included.
-P
A More Detailed Look: Computing Pixels Pooling layers summarize the visual features that exist in each
Clearly, there are no rods and cones in a CNN. Instead, a area of an image52. The layer pools together the most relevant
computer uses pixels—the small, illuminated dots that make information and abstracts away the less helpful details. It
up images on computer displays. Yet, computers do not “see” creates a lower-resolution version of the feature maps. It
IN
illuminated pixels as a human does. Instead, computers rep- keeps a record of the important and predictive features, while
resent pixels numerically, assigning a number to each pixel that omitting the fine details that may not be useful for the task.
represents its brightness49-51. White pixels have a value of 1, For each input feature map, these layers output smaller
black pixels have a value of 0, and gray is somewhere in between summarized feature maps. In a max-pooling layer, for
(Fig. 4). For grayscale images such as radiographs, an image is example, a 100 · 100 input feature map would be divided
just a 2-dimensional matrix of numbers. As in a Microsoft into 5 · 5 areas and the largest value in each would be re-
Excel spreadsheet, this matrix is merely a table of numbers turned53. The end result is a 20 · 20 output that records the
stored in columns and rows (width times height), with each cell presence of important visual features while reducing the
of the table storing the brightness of a pixel in a particular input from areas that lack features (e.g., the radiograph’s
location49,51. black background).
Using this format, a CNN must mathematically detect After several layers in a CNN, the feature maps contain
shapes within an image’s matrix of numbers. The first layer of measures of how strongly each higher-level feature was
the CNN starts simply, with neurons containing mathematical detected. At this stage, the final layers are dedicated to
9
d
d d
LE
C
TI
AR
S
Fig. 5
To detect a shape (i.e., a visual feature) within an image, a “drawing” of the shape is passed over the image. This drawing is called a filter. The figure shows a
3 · 3 filter used for detection of an edge (right-sided) along with its corresponding matrix values. This filter is passed over the entire image in a scanning
ES
fashion, left to right, top to bottom. The value for each pixel is multiplied by the value of the corresponding cell within the 3 · 3 filter, and the result is summed
to produce a single value. Mathematically, this is called convolving the image with a filter, and it is repeated over the whole image. The output of this
mathematical operation is a new matrix of numerical values called a feature map, presented here as an image. This feature map shows how strongly right-
sided edges were detected at each point in the original image, with white pixels where the pattern exists and black pixels where it does not.
R
answering whatever question the model was trained for (e.g., dence that the given image contains a fracture of a given type
is a fracture present?). This decision is made by hundreds of (e.g., femoral neck, intertrochanteric, subtrochanteric hip frac-
-P
neurons connected in what are called the fully connected layers tures, etc., or no fracture at all).
of the CNN. These fully connected neurons are quite different Figure 6 presents a simplified example of a complete
from their predecessors. Each neuron considers all of the fea- CNN. It has a total of 2 convolutional layers and 2 max-pooling
ture measurements from the prior layer3. Each neuron is de- layers before the fully connected layers. Real-world CNNs
IN
signed to look for a specific combination of higher-level would have many more convolutional and max-pooling layers
features to be present and outputs a confidence measure indi- to detect enough visual features. In this example, with a score of
cating how certain it is that they are present. With multiple fully 0.96, the network is highly confident that a fracture is present.
connected layers, complex decisions can be made as to whether
the found features exist in the right combination, amount, How a CNN Is Created
spatial layout, and locations for the model to make a decision. To Overview: Teaching Through Testing
indicate fracture presence, the final layer would be a single Traditionally, software is created by humans writing precise
neuron, whose output would be a single number—that is, its instructions for executing a task (e.g., if “this,” then “do that,”
certainty of whether a fracture exists on a scale of 0 to 1. In the etc.). In the case of human vision, however, we do not really
case of fracture classification, the number of neurons in the final know how to express these instructions. For example, imagine
layer would be equal to the number of different types of fractures trying to write a computer program that can enable computers
the system can identify (plus 1 for no fracture). Again, the output to see fractures in radiographs (e.g., if “cortical disruption here”
of each neuron would be a single number, the model’s confi- then “femoral neck fracture,” or if “jagged edge there” then
10
d
d d
LE
C
TI
Fig. 6
A simplified example of a complete CNN. This CNN has a total of 2 convolutional layers and 2 max-pooling layers before the fully connected layers. Real-world
networks have many more convolutional and max-pooling layers to detect enough visual features to make accurate predictions. At the end of the fully
connected network, a different type of neuron reads the final feature maps and “decides” on the basis of the presence and combination of different visual
AR
features whether it believes a fracture is present (expressed on a scale of 0 to 1). In this example, with a score of 0.96, the network is highly confident that a
fracture is present.
“intertrochanteric fracture”). Approaching the problem in this human skeleton. This provides a seeming impasse when we
manner would make for an endless task of describing every attempt to create a computer that can “see” fractures in radi-
possible fracture pattern ever exhibited (or possible) within the ographs: We cannot write instructions for what we cannot
S
comprehensively explain ourselves. However, by starting with the

presumption that a mathematical model must exist to do a given
ES
task (e.g., fracture detection), then the challenge changes. Rather

TABLE III Grades of Recommendation
than trying to write a program that sees radiographs as a human
Recommendation Grade* does, we instead write a computer program that will find a
mathematical model (i.e., a mathematical function) that can “see”
Deep learning has demonstrated I
remarkable progress in the analysis of fractures in radiographs. This is the premise behind all of machine
learning—that is, that reality can be represented with a mathe-
R
medical imaging across a range of

modalities that are commonly used in matical function that we do not know in advance but that we can
orthopaedics. attempt to estimate through iterative testing and adjustments2.
Creating a program to find a mathematical model that
-P
There is a growing body of evidence I

showing clinical utility for deep learning in can “see” fractures requires 3 things: a rough draft model, data,
musculoskeletal radiography, as and a teacher to teach the model from that data. Trying to
evidenced by studies that use deep write a teacher program that knows how to “see” only returns
learning to achieve expert or near-expert-
to the problem of explaining vision. So, instead, the pro-
IN
level performance for the identification and

localization of fractures on radiographs. grammer does not use a teacher at all. Rather, the teacher
Deep learning is now in the very early I
becomes a proctor. It does not teach, it tests. It does this testing
stages of entering the clinical setting, with data. The programmer provides the proctor with data in
involving validation and proof-of-concept the form of tests and answer keys (e.g., inputting thousands of
studies. radiographs and indicating which ones contain fractures). The
first draft model being tested is a CNN populated by mathe-
54
*According to Wright , grade A indicates good evidence (Level-I matical functions with random values, and not surprisingly, it
studies with consistent findings) for or against recommending
intervention; grade B, fair evidence (Level-II or III studies with
scores very poorly on its first test: few fractures are accurately
consistent findings) for or against recommending intervention; identified. But this is just the starting point. Given the poor test
grade C, poor-quality evidence (Level-IV or V studies with consistent result, the proctor program looks at the CNN and suggests
findings) for or against recommending intervention; and grade I, what changes in mathematical functions would have given a
insufficient or conflicting evidence not allowing a recommendation
for or against intervention. better score on the test. For a CNN, this translates to tasks such
as changing what kind of shapes the filters should find in the
11
d
d d
radiographs and how much each neuron should weight the grammer. Just as the building materials available determine how
input from different earlier neurons. The CNN is updated with complex and what kind of object may be built, the internal
these adjustments, and a brand-new version of the test is then architecture of the CNN determines limits of what it can and
given with the intention that this updated version of the CNN cannot “see.” The exploration of different architectures is what
will achieve a better score by accurately identifying a larger consumes much of the research and model development efforts.
share of radiographs with fractures. This cycle is iterative. It
continues until the accuracy of the CNN model levels off and Conclusions
the training is ended. A CNN is said to be successfully trained CNNs have shown great promise in many areas of computer
when it can find the most important patterns of the training vision, including the interpretation of medical images, and will
data, apply those patterns to new prospective data in the real soon be entering the clinical setting for validation and proof of
LE
world, and maintain an acceptable degree of accuracy. concept (Table III). A CNN does not know what an image is; it
does not perceive an image as humans do. Rather, a CNN reads
A More Detailed Look: Born from Data, Not Logic pixels as numerical values of brightness and proximity. A CNN
The process of repeatedly testing and adjusting the mathematical is “neural” in that its mathematical structure is inspired by
functions in the model is called supervised learning2. A model biologic neural networks. This enables a CNN to extract visual
C
starts off with no preconceived notions; all filters and neuron features (i.e., shapes) from an image that can ultimately be used
functions are initialized with random values. Initially, when an to make useful predictions about the image. In the case of
fracture identification, a machine “reads” a radiograph by
TI
image is inputted, the output is random as well. The model needs
to be “trained” on a task. The training images are typically in- applying a CNN trained for fracture detection to predict the
putted in small batches. After each batch, the functions and filters likelihood of the image containing a fracture.
of the model are updated by the proctor program to achieve better AR
accuracy. At some point, the system reaches its accuracy limit. The Source of Funding
final accuracy is determined by the difficulty of the task, the There was no source of funding for this study. n
available training data, and the sophistication of the model.
Since machine learning is a methodology for creating
computer programs through examples rather than logic, a
model is only as good as the data it is trained on. Too little data
and the model may “memorize” the data it trained on, and Brandon G. Hill, MSc1
S
therefore not predict well on new data—a concept referred to Justin D. Krogue, MD2,3
David S. Jevsevar, MD, MBA1,4
as overfitting2. Even with substantial amounts of data, those data Peter L. Schilling, MD, MSc1,4
ES
may not have enough difficult radiographs to permit learning

how to manage challenging cases. Similarly, it is common for 1Dartmouth Hitchcock Medical Center, Lebanon, New Hampshire
the task itself to have accuracy limits. Models are often trained
2Google Health, Palo Alto, California
to do a task that humans themselves cannot agree on (some
radiographs are difficult enough that 2 experts disagree). 3Department
Thus, a model trained to match expert opinion will naturally of Orthopaedic Surgery, University of California San
R
Francisco, San Francisco, California

reflect these limits of human judgement in its own accuracy.
While the specific filters of a CNN are not developed by a 4The Geisel School of Medicine at Dartmouth, Hanover, New Hampshire
human programmer, the model architecture (the kind and count
-P
of neurons that make up each layer) is created by the pro- Email for corresponding author: peter.leif.schilling@gmail.com
References
IN
1. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep con- 9. Fagnant DJ, Kockelman K. Preparing a nation for autonomous vehicles: opportunities,
volutional neural networks. Commun ACM. 2017 Jun;60(6):84-90. barriers and policy recommendations. Transp Res Part A Policy Pract. 2015;77:167-81.
2. Bishop CM. Pattern recognition and machine learning. Springer; 2006. 10. Hinton G, Deng L, Yu D, Dahl GE, Mohamed A-R, Jaitly N, Senior A, Vanhoucke V,
3. Goodfellow I, Bengio Y, Courville A. Deep learning. MIT Press; 2016. Nguyen P, Sainath TN, Kingsbury B. Deep Neural Networks for Acoustic Modeling in
4. Samuel AL. Some studies in machine learning using the game of checkers. IBM J Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Pro-
Res Develop. 1959;3(3):210-29. cess Mag. 2012 Nov;29(6):82-97.
5. Domingos P. The master algorithm: how the quest for the ultimate learning 11. Lewis-Kraus G. The Great A.I. Awakening. 2016 Dec 14. https://www.nytimes.
machine will remake our world. Basic Books; 2015. com/2016/12/14/magazine/the-great-ai-awakening.html
6. Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back- 12. Mikolov T, Deoras A, Povey D, Burget L, Černocký J. Strategies for training large
propagating errors. Nature. 1986;323(6088):533-6. scale neural network language models. In: Proceedings of the 2011 IEEE Workshop
7. Condliffe J. In 2016, AI Home Assistants Won Our Hearts. 2016 Dec 20. Ac- on Automatic Speech Recognition & Understanding,, 2011 Dec 11-15. Institute of
cessed February 18, 2022. https://www.technologyreview.com/2016/12/20/ Electrical and Electronics Engineers; 2011. p 196-201.
155032/in-2016-ai-home-assistants-won-our-hearts/ 13. Sallab AE, Abdou M, Perot E, Yogamani S. Deep Reinforcement Learning
8. Dahl GE, Yu D, Deng L, Acero A. Context-Dependent Pre-Trained Deep Neural Framework for Autonomous Driving. In: Proceedings of the Imaging Science and
Networks for Large-Vocabulary Speech Recognition. IEEE Trans Audio Speech Lang Technology International Symposium on Electronic Imaging: Autonomous Vehicles
Process. 2012 Jan;20(1):30-42. and Machine; 2017. Society for Imaging Science and Technology; 2017. p 70-6.
12
d
d d
14. Thrun S, Montemerlo M, Dahlkamp H, Stavens D, Aron A, Diebel J, Fong P, Gale 34. von Schacky CE, Wilhelm NJ, Schäfer VS, Leonhardt Y, Gassert FG, Foreman SC,
J, Halpenny M, Hoffmann G, Lau K, Oakley C, Palatucci M, Pratt V, Stang P, Stroh- Gassert FT, Jung M, Jungmann PM, Russe MF, Mogler C, Knebel C, von Eisenhart-
band S, Dupont C, Jendrossek L-E, Koelen C, Markey C, Rummel C, van Niekerk J, Rothe R, Makowski MR, Woertler K, Burgkart R, Gersing AS. Multitask Deep Learning
Jensen E, Alessandrini P, Bradski G, Davies B, Ettinger S, Kaehler A, Nefian A, for Segmentation and Classification of Primary Bone Tumors on Radiographs.
Mahoney P. Stanley: The robot that won the DARPA Grand Challenge. J Field Robot. Radiology. 2021 Nov;301(2):398-406.
2006;23(9):661-92. 35. Yamamoto N, Sukegawa S, Kitamura A, Goto R, Noda T, Nakano K, Takabatake
15. Upson S. The A.I. Takeover Is Coming. Let’s Embrace It. 2016 Dec 22. https:// K, Kawai H, Nagatsuka H, Kawasaki K, Furuki Y, Ozaki T. Deep Learning for Osteo-
www.wired.com/2016/12/the-ai-takeover-is-coming-lets-embrace-it/ porosis Classification Using Hip Radiographs and Patient Clinical Covariates. Bio-
#:;:text=Takeover%20Is%20Coming.-,Let's%20Embrace%20It.,from%20the% molecules. 2020 Nov 10;10(11):E1534.
20coming%20AI%20revolution 36. Chen HY, Hsu BW, Yin YK, Lin FH, Yang TH, Yang RS, Lee CK, Tseng VS.
16. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015 May 28;521(7553): Application of deep learning algorithm to detect and visualize vertebral fractures on
436-44. plain frontal radiographs. PLoS One. 2021 Jan 28;16(1):e0245992.
17. Ma J, Sheridan RP, Liaw A, Dahl GE, Svetnik V. Deep neural nets as a method for 37. Cheng CT, Ho TY, Lee TY, Chang CC, Chou CC, Chen CC, Chung IF, Liao CH.
LE
quantitative structure-activity relationships. J Chem Inf Model. 2015 Feb 23;55(2): Application of a deep learning algorithm for detection and visualization of hip frac-
263-74. tures on plain pelvic radiographs. Eur Radiol. 2019 Oct;29(10):5469-77.
18. Horng MH, Kuok CP, Fu MJ, Lin CJ, Sun YN. Cobb Angle Measurement of Spine 38. Chung SW, Han SS, Lee JW, Oh KS, Kim NR, Yoon JP, Kim JY, Moon SH, Kwon J,
from X-Ray Images Using Convolutional Neural Network. Comput Math Methods Lee HJ, Noh YM, Kim Y. Automated detection and classification of the proximal
Med. 2019 Feb 19;2019:6357171. humerus fracture by using deep learning algorithm. Acta Orthop. 2018 Aug;89(4):
19. Liu F, Zhou Z, Samsonov A, Blankenbaker D, Larison W, Kanarek A, Lian K, 468-73.
Kambhampati S, Kijowski R. Deep Learning Approach for Evaluating Knee MR 39. Jones RM, Sharma A, Hotchkiss R, Sperling JW, Hamburger J, Ledig C, O’Toole
C
Images: Achieving High Diagnostic Performance for Cartilage Lesion Detection. R, Gardner M, Venkatesh S, Roberts MM, Sauvestre R, Shatkhin M, Gupta A, Chopra
Radiology. 2018 Oct;289(1):160-9. S, Kumaravel M, Daluiski A, Plogger W, Nascone J, Potter HG, Lindsey RV.
20. Nguyen TP, Chae D-S, Park S-J, Kang K-Y, Yoon J. Deep learning system for Assessment of a deep-learning system for fracture detection in musculoskeletal
Meyerding classification and segmental motion measurement in diagnosis of lumbar radiographs. NPJ Digit Med. 2020 Oct 30;3:144.
TI
spondylolisthesis. Biomed Signal Process Control. 2021;65:102371. 40. Kim DH, MacKinnon T. Artificial intelligence in fracture detection: transfer
21. Norman B, Pedoia V, Noworolski A, Link TM, Majumdar S. Applying Densely learning from deep convolutional neural networks. Clin Radiol. 2018 May;73(5):
Connected Convolutional Neural Networks for Staging Osteoarthritis Severity from 439-45.
Plain Radiographs. J Digit Imaging. 2019 Jun;32(3):471-7. 41. Kitamura G, Chung CY, Moore BE 2nd. Ankle Fracture Detection Utilizing a
22. Pierson E, Cutler DM, Leskovec J, Mullainathan S, Obermeyer Z. An algorithmic Convolutional Neural Network Ensemble Implemented with a Small Sample, De Novo
23. Tiulpin A, Thevenot J, Rahtu E, Lehenkari P, Saarakkala S. Automatic Knee

AR
approach to reducing unexplained pain disparities in underserved populations. Nat
Med. 2021 Jan;27(1):136-40.
Osteoarthritis Diagnosis from Plain Radiographs: A Deep Learning-Based Approach.

Training, and Multiview Incorporation. J Digit Imaging. 2019 Aug;32(4):672-7.
42. Urakawa T, Tanaka Y, Goto S, Matsuzawa H, Watanabe K, Endo N. Detecting
intertrochanteric hip fractures with orthopedist-level accuracy using a deep con-
volutional neural network. Skeletal Radiol. 2019 Feb;48(2):239-44.
Sci Rep. 2018 Jan 29;8(1):1727. 43. Thian YL, Li Y, Jagmohan P, Sia D, Chan VEY, Tan RT. Convolutional Neural
24. Tolpadi AA, Lee JJ, Pedoia V, Majumdar S. Deep Learning Predicts Total Knee Networks for Automated Fracture Detection and Localization on Wrist Radiographs.
Replacement from Magnetic Resonance Images. Sci Rep. 2020 Apr 14;10(1):6371. Radiol Artif Intell. 2019 Jan 30;1(1):e180001
25. Zheng Q, Shellikeri S, Huang H, Hwang M, Sze RW. Deep Learning Measure- 44. Mosquera C, Binder F, Diaz FN, Seehaus A, Ducrey G, Ocantos JA, Aineseder M,
ment of Leg Length Discrepancy in Children Based on Radiographs. Radiology. 2020 Rubin L, Rabinovich DA, Quiroga AE, Martinez B, Beresnak AD, Benitez SE, Luna DR.
S
Jul;296(1):152-8. Integration of a deep learning system for automated chest x-ray interpretation in the
26. Grauhan NF, Niehues SM, Gaudin RA, Keller S, Vahldiek JL, Adams LC, Bressem emergency department: A proof-of-concept. Intelligence-Based Medicine. 2021;5:
KK. Deep learning for accurately recognizing common causes of shoulder pain on 100039.
radiographs. Skeletal Radiol. 2022 Feb;51(2):355-62. 45. Lindsey R, Daluiski A, Chopra S, Lachapelle A, Mozer M, Sicular S, Hanel D,
ES
27. Halabi SS, Prevedello LM, Kalpathy-Cramer J, Mamonov AB, Bilbily A, Cicero M, Gardner M, Gupta A, Hotchkiss R, Potter H. Deep neural network improves
Pan I, Pereira LA, Sousa RT, Abdala N, Kitamura FC, Thodberg HH, Chen L, Shih G, fracture detection by clinicians. Proc Natl Acad Sci U S A. 2018 Nov 6;115(45):
Andriole K, Kohli MD, Erickson BJ, Flanders AE. The RSNA Pediatric Bone Age 11591-6.
Machine Learning Challenge. Radiology. 2019 Feb;290(2):498-503. 46. Fukushima K. Neocognitron: a self organizing neural network model for a
28. Hsieh CI, Zheng K, Lin C, Mei L, Lu L, Li W, Chen FP, Wang Y, Zhou X, Wang F, Xie mechanism of pattern recognition unaffected by shift in position. Biol Cybern. 1980;
G, Xiao J, Miao S, Kuo CF. Automated bone mineral density prediction and fracture 36(4):193-202.
risk assessment using plain radiographs via deep learning. Nat Commun. 2021 Sep 47. Hubel DH, Wiesel TN. Receptive fields of single neurones in the cat’s striate
R
16;12(1):5472. cortex. J Physiol. 1959 Oct;148:574-91.

29. Karnuta JM, Haeberle HS, Luu BC, Roth AL, Molloy RM, Nystrom LM, Piuzzi NS, 48. McCulloch WS, Pitts W. A logical calculus of the ideas immanent in nervous
Schaffer JL, Chen AF, Iorio R, Krebs VE, Ramkumar PN. Artificial Intelligence to activity. 1943. Bull Math Biol. 1990;52(1-2):99-115, discussion 73-97.
Identify Arthroplasty Implants From Radiographs of the Hip. J Arthroplasty. 2021 Jul; 49. Forsyth DA, Ponce J. Computer vision: a modern approach. 2nd ed. Pearson;
-P
36(7S):S290-S294. 2012.
30. Karnuta JM, Luu BC, Roth AL, Haeberle HS, Chen AF, Iorio R, Schaffer JL, Mont 50. Kirsch RA. SEAC and the start of image processing at the National Bureau of
MA, Patterson BM, Krebs VE, Ramkumar PN. Artificial Intelligence to Identify Standards. IEEE Ann Hist Comput. 1998;20(2):7-13.
Arthroplasty Implants From Radiographs of the Knee. J Arthroplasty. 2021 Mar; 51. Szeliski R. Computer Vision: Algorithms and Applications. 1st ed. Springer;
36(3):935-40. 2011.
31. Krogue JD, Cheng KV, Hwang KM, Toogood P, Meinberg EG, Geiger EJ, Zaid M, 52. Weng J, Ahuja N, Huang TS. Learning recognition and segmentation of 3-D
IN
McGill KC, Patel R, Sohn JH, Wright A, Darger BF, Padrez KA, Ozhinsky E, Majumdar objects from 2-D images. In: Proceedings of the 1993 (4th) International Conference
S, Pedoia V. Automatic Hip Fracture Identification and Functional Subclassification on Computer Vision; 1993 May 11-14. Institute of Electrical and Electronics Engi-
with Deep Learning. Radiol Artif Intell. 2020 Mar 25;2(2):e190023. neers; 1993. p 121-8.
32. Rouzrokh P, Ramazanian T, Wyles CC, Philbrick KA, Cai JC, Taunton MJ, Maradit 53. Yamaguchi K, Sakamoto K, Akabane T, Fujimoto Y. A neural network for
Kremers H, Lewallen DG, Erickson BJ. Deep Learning Artificial Intelligence Model for speaker-independent isolated word recognition. In: Proceedings of the First Inter-
Assessment of Hip Dislocation Risk Following Primary Total Hip Arthroplasty From national Conference on Spoken Language Processing; 1990 Nov 18-22. Interna-
Postoperative Radiographs. J Arthroplasty. 2021 Jun;36(6):2197-2203 e3. tional Conference on Spoken Language Processing; 1990. p 1077-80.
33. Shah RF, Bini SA, Martinez AM, Pedoia V, Vail TP. Incremental inputs improve 54. Wright JG. Revised grades of recommendation for summaries or reviews
the automated detection of implant loosening using machine-learning algorithms. of orthopaedic surgical studies. J Bone Joint Surg Am. 2006 May;88(5):
Bone Joint J. 2020 Jun;102-B(6_Supple_A):101-6. 1161-2.

In-Press Article: Deep Learning and Imaging For The Orthopaedic Surgeon

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

In-Press Article: Deep Learning and Imaging For The Orthopaedic Surgeon

Uploaded by

Copyright:

Available Formats

1

C OPYRIGHT 2022 BY T HE J OURNAL OF B ONE AND J OINT S URGERY, I NCORPORATED

Current Concepts Review

Investigation performed at Dartmouth Hitchcock Medical Center, Lebanon, New Hampshire

concept studies for automated medical image interpretation.

J Bone Joint Surg Am. 2022;00:1-12 d http://dx.doi.org/10.2106/JBJS.21.01387

Diagnosis (including classiﬁcation, staging, and severity of disease)

shoulder pain on radiographs, including proximal humeral fractures, joint

Prognosis and risk stratiﬁcation

Deep learning measurement of leg-length discrepancy in children based on radi-

Study by Training Performance of Model†

Prediction classes: from patients with

(2018) - Fracture detection cohort: orthop. specialists and

fracture independent reports, CT

Study by Training Performance of Model†

Prediction classes: anatomical regions radiologists

functions to detect elementary shapes such as lines or edges (so-

comprehensively explain ourselves. However, by starting with the

task (e.g., fracture detection), then the challenge changes. Rather

medical imaging across a range of

There is a growing body of evidence I

level performance for the identiﬁcation and

may not have enough difﬁcult radiographs to permit learning

Francisco, San Francisco, California

23. Tiulpin A, Thevenot J, Rahtu E, Lehenkari P, Saarakkala S. Automatic Knee

Osteoarthritis Diagnosis from Plain Radiographs: A Deep Learning-Based Approach.

16;12(1):5472. cortex. J Physiol. 1959 Oct;148:574-91.

You might also like