You are on page 1of 11

applied

sciences
Article
Deep Convolutional Neural Networks Based Analysis
of Cephalometric Radiographs for Differential
Diagnosis of Orthognathic Surgery Indications
Ki-Sun Lee 1,2,3, *, Jae-Jun Ryu 4 , Hyon Seok Jang 5 , Dong-Yul Lee 6 and Seok-Ki Jung 7, *
1 Department of Biomedical Engineering, College of Medicine, Seoul National University, Seoul 03080, Korea
2 Department of Clinical Dentistry, College of Medicine, Korea University, Seoul 02841, Korea
3 Department of Prosthodontics, Korea University Ansan Hospital, Gyung-gi do 15355, Korea
4 Department of Prosthodontics, Korea University Anam Hospital, Seoul 02841, Korea; koprosth@unitel.co.kr
5 Department of Oral and Maxillofacial Surgery, Korea University Ansan Hospital, Gyung-gi do 15355, Korea;
omfs1109@korea.ac.kr
6 Department of Orthodontics, Korea University Guro Hospital, Seoul 08308, Korea; dong09@korea.ac.kr
7 Department of Orthodontics, Korea University Ansan Hospital, Gyung-gi do 15355, Korea
* Correspondence: kisuns@gmail.com (K.-S.L.); jgosggg@korea.ac.kr (S.-K.J.)

Received: 13 February 2020; Accepted: 11 March 2020; Published: 20 March 2020 

Abstract: The aim of this study was to evaluate the deep convolutional neural networks (DCNNs)
based on analysis of cephalometric radiographs for the differential diagnosis of the indications of
orthognathic surgery. Among the DCNNs, Modified-Alexnet, MobileNet, and Resnet50 were
used, and the accuracy of the models was evaluated by performing 4-fold cross validation.
Additionally, gradient-weighted class activation mapping (Grad-CAM) was used to perform visualized
interpretation to determine which region affected the DCNNs’ class classification. The prediction
accuracy of the models was 96.4% for Modified-Alexnet, 95.4% for MobileNet, and 95.6% for Resnet50.
According to the Grad-CAM analysis, the most influential regions for the DCNNs’ class classification
were the maxillary and mandibular teeth, mandible, and mandibular symphysis. This study suggests
that DCNNs-based analysis of cephalometric radiograph images can be successfully applied for
differential diagnosis of the indications of orthognathic surgery.

Keywords: artificial intelligence; convolutional neural networks; cephalometric radiographs;


orthognathic surgery

1. Introduction
Diagnosing the need for orthognathic surgery is an important issue in the field of orthodontics
and oral and maxillofacial surgery [1]. Orthognathic surgery is expensive and associated with the risks
of general anesthesia. Therefore, patients tend to prefer orthodontic treatment to orthognathic surgery.
However, there are cranial structural problems that cannot be solved by orthodontic treatment alone.
A prominent chin, retrusive mandible, and jaw asymmetry can only be corrected by orthognathic
surgery. Orthognathic surgery is also chosen when there is a limit to the esthetic improvement that can
be achieved with orthodontic treatment alone [2,3].
The choice of orthognathic surgery depends on the evaluation of the objective data and not just
the patient’s decision. The determination of whether a patient’s skeletal problem can be solved by an
orthodontic approach or if it requires a surgical approach is a very sensitive issue [4]. Experienced
orthodontists and oral surgeons can evaluate the need of a case, but this can be difficult if the clinician
is less experienced or if it is a borderline case.

Appl. Sci. 2020, 10, 2124; doi:10.3390/app10062124 www.mdpi.com/journal/applsci


Appl. Sci. 2020, 10, 2124 2 of 11

Several clinical data are selected for this differential diagnosis [5], of which cephalometric
radiographs are the most important radiological data used to determine the antero-posterior skeletal
problems of patients [6,7]. For most surgical evaluations, determining whether a patient has a
prominent jaw or retrusive jaw is most critical, and cephalometric radiographs are used to determine
whether the difference between the maxilla and mandible can be overcome by orthodontic treatment
alone or whether surgery is necessary.
There are approximately 50 landmarks used in cephalometric radiographs, and various
measurements such as line length and angle using these landmarks are used for analysis [8,9].
Different clinicians may consider different measurements important and analyze experimentally in
light of their professional experience. This could be a difficult process for less experienced clinicians,
and it can be difficult to have confidence in their decisions.
Furthermore, there is a reason why these measurements cannot be used as is. Even the
sella–nasion–A point (SNA) angle and sella–nasion–B point (SNB) angle, which are the measurement
values of the anterior and posterior positions of the maxilla and the mandible, may show different
values even in the same skeletal state due to the change in the position of the nasion [10]. Therefore, in
measurement-based evaluation, the supplementary decision of the clinician is important.
Artificial intelligence (AI), especially deep learning using convolutional neural networks (CNNs),
has become the most sought-after field [11–14]. With the development of computing hardware and
graphics processing units, the computational processing speed has increased, complex calculations can
be performed in a short time, and deep learning using deep convolutional neural networks (DCNNs)
has been activated [15,16]. Particularly, the development of deep learning in the categorization of
images is surprising, and in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) of 2015,
it went beyond human perception [17].
Various deep learning neural network models are being developed and applied to many fields.
Image pattern recognition and classification problems are practically used in all fields; hence, application
of deep learning is endless [18]. The medical field is no exception, and deep learning is being applied
to different issues, including diagnosis, in several clinical fields [19,20].
There have been many studies using various AI models in orthodontics [21,22]. When determining
the need for extraction, there were studies that applied machine-learning models to predict the result
by constructing neural networks using landmark measurements [23,24]. These studies had the
disadvantage of having to manually measure the landmark points, calculate the measurement values,
and input them in the input stage in order to apply the neural network model. Another disadvantage
was that information could be missed by entering only certain measurements selected by the person
composing the neural network.
Therefore, in this study, the deep learning model was used for diagnosis based on the image
that inputs all the cephalometric radiographs, and not the machine-learning model based on specific
measurement values. Through this, we tried to obtain a better performance diagnostic model by
reducing the errors that can occur during the pre-processing step and obtaining information that
cannot be transmitted through the measured values.
The purpose of this study was to create a model for the differential diagnosis of orthognathic
surgery and orthodontic treatment with a deep learning algorithm using cephalometric radiographs.
Moreover, we intended to evaluate the performance of three deep learning models that have different
structures and compared the factors that help improve their performance.

2. Materials and Methods


A total of 333 individuals who visited the Korea University Ansan Hospital for orthodontic
evaluation between 2014 and 2019 were enrolled in this study. Cephalometric radiograph images were
used for clinical examination, and 159 and 174 patients were indicated for orthognathic surgery and
orthodontic treatment, respectively (Table 1).
Appl. Sci. 2020, 10, 2124 3 of 11
Appl. Sci. 2020, 10, x FOR PEER REVIEW 4 of 12

Table1.1. Characteristics
Table Characteristics of
of the
the samples.
samples.

Parameter
Parameter Orthodontic
OrthodonticTreatment
Treatment Orthognathic
OrthognathicSurgery
Surgery Total
Total
Numberofofpatients
Number patients 159159 174174 333
333
Number
Numberofoffemales/males
females/males 88/71
88/71 93/81
93/81 181/152
181/152
Mean
Meanage (SD)
age (SD) 22.7 (5.8)
22.7 (5.8) 23.423.4
(4.9)(4.9) 23.1
23.1(5.1)
(5.1)

The datasets
The datasets were
were randomly
randomly split.
split. Forty
Forty cases
cases were
were used
used solely
solely to
to evaluate
evaluate the
the model,
model, while
while the
the
training and
training and validation
validation sets
sets were
were comprised
comprised of of the
the remaining
remaining293293cases.
cases. The
The training
training set
set consisted
consisted ofof
220 cases
220 cases and
and the
the validation
validation setset consisted
consisted of of 73
73 cases.
cases. This research
research protocol
protocol was
was approved
approved by by the
the
Institutional Review
Institutional Review Board,
Board, Korea
KoreaUniversity
UniversityAnsanAnsanHospital
Hospital(no.
(no. 2020AS0062).
2020AS0062).
In the
In the data
data pre-processing
pre-processing step,
step, 50
50 landmarks
landmarks were were detected
detected using
using an
an automated
automated landmark
landmark
detection method
detection method using
using the
the VGG16
VGG16 model.
model. All cephalometric
cephalometric radiographs were alignedaligned using
using the
the
Frankfort-horizontal plane,
Frankfort-horizontal plane, and and a minimum box containing all 50 landmarks was selected (Figure 1).
Following this,
Following this, aa square
square box
box was
was selected
selected asas the
the lower
lower part
part where
where the
the main
main structures
structures including
including the
the
maxilla and
maxilla and mandible
mandiblecould
couldbe beincluded.
included. The
The data
data were
were cropped
croppedby bygiving
givingaa margin
marginof of10%.
10%. The
The data
data
size was
size was down-sampled
down-sampled to to aa 256
256 ×× 256 pixel size.

Figure 1. (a) Landmarks used in this study: 1, sella; 2, nasion; 3, porion; 4, orbitale; 5, basion; 6,
Figure 1. (a) Landmarks used in this study: 1, sella; 2, nasion; 3, porion; 4, orbitale; 5, basion; 6,
articulare; 7, pterygomaxillary fissure; 8, anterior nasal spine; 9, posterior nasal spine; 10, A point;
articulare; 7, pterygomaxillary fissure; 8, anterior nasal spine; 9, posterior nasal spine; 10, A point; 11,
11, B point; 12, PM; 13, pogonion; 14, gnathion; 15, menton; 16, maxillary incisor tip; 17, maxillary
B point; 12, PM; 13, pogonion; 14, gnathion; 15, menton; 16, maxillary incisor tip; 17, maxillary incisor
incisor root; 18, mandibular incisor tip; 19, mandibular incisor root; 20, maxillary first molar distal;
root; 18, mandibular incisor tip; 19, mandibular incisor root; 20, maxillary first molar distal; 21,
21, mandibular first molar distal; 22, antegonial notch; 23, corpus left; 24, gonion; 25, ramus down;
mandibular first molar distal; 22, antegonial notch; 23, corpus left; 24, gonion; 25, ramus down; 26,
26, condylion; 27, R3; 28, R1; 29, glabella; 30, soft tissue nasion; 31, nasal bridge point; 32, dorsum of
condylion; 27, R3; 28, R1; 29, glabella; 30, soft tissue nasion; 31, nasal bridge point; 32, dorsum of nose;
nose; 33, pronasale; 34, columella; 35, subnasale; 36, soft tissue A; 37, labrale superius; 38, upper lip; 39,
33, pronasale; 34, columella; 35, subnasale; 36, soft tissue A; 37, labrale superius; 38, upper lip; 39,
stomion superius; 40, upper embrasure; 41, lower embrasure; 42, stomion inferius; 43, lower lip; 44,
stomion superius; 40, upper embrasure; 41, lower embrasure; 42, stomion inferius; 43, lower lip; 44,
labrale inferius; 45, soft tissue B; 46, soft tissue pogonion; 47, soft tissue gnathion; 48, soft tissue menton;
labrale inferius; 45, soft tissue B; 46, soft tissue pogonion; 47, soft tissue gnathion; 48, soft tissue
49, submandibular point; and 50, cervical point. (b) Minimum box selection including all landmarks.
menton; 49, submandibular point; and 50, cervical point. (b) Minimum box selection including all
(c) Square box selection including the inferior part and down-sampling.
landmarks. (c) Square box selection including the inferior part and down-sampling.
The three deep learning models used in this study were Modified-Alexnet, MobileNet, and Resnet50
The
(Figure 2).three deep learning models used in this study were Modified-Alexnet, MobileNet, and
Resnet50 (Figure 2).
Appl. Sci. 2020, 10, 2124 4 of 11
Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 12

Figure 2.2.The
Figure Thethree deep
three learning
deep models
learning used in
models this in
used study
thiswere (a) Modified-Alexnet,
study (b) MobileNet,
were (a) Modified-Alexnet, (b)
and (c) Resnet50.
MobileNet, and (c) Resnet50.

Modified-Alexnet combined the separated modules in the basic Alexnet for ease of implementation,
Modified-Alexnet combined the separated modules in the basic Alexnet for ease of
and had an input image size of 227 × 227 pixels [25]. MobileNet and Resnet50 had a default size of
implementation, and had an input image size of 227 × 227 pixels [25]. MobileNet and Resnet50 had a
224 × 224 pixels for the input image [26].
default size of 224 × 224 pixels for the input image [26].
Modified-Alexnet used He-normal initialization as the initial weight, and MobileNet and Resnet50
Modified-Alexnet used He-normal initialization as the initial weight, and MobileNet and
used the weight provided by ImageNet as the initial weight. The training was conducted using the
Resnet50 used the weight provided by ImageNet as the initial weight. The training was conducted
stochastic gradient descent (SGD) optimizer [1,27]. The initial learning rate was set at 0.002 and the
using the stochastic gradient descent (SGD) optimizer [1,27]. The initial learning rate was set at 0.002
batch size was set at 32. A total of 150 epochs were performed, and the learning rate was adjusted to a
and the batch size was set at 32. A total of 150 epochs were performed, and the learning rate was
0.1 factor when the validation loss value did not improve more than 1e-6 during 20 epochs. At the
adjusted to a 0.1 factor when the validation loss value did not improve more than 1e-6 during 20
epochs. At the point when validation accuracy no longer increased significantly, the study was
terminated and the model was evaluated.
Appl. Sci. 2020, 10, 2124 5 of 11

point when validation accuracy no longer increased significantly, the study was terminated
Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 12
and the
model was evaluated.
The data
data inputs
inputsofofthe 256××256
the256 256pixel
pixelimages
images were
were randomly
randomly cropped
cropped to each
to fit fit each model
model × × 227
(227(227
227 pixels
pixels for for Modified-Alexnetand
Modified-Alexnet and224
224 ××224
224pixels forfor
pixels both MobileNet
both MobileNetand Resnet50) and randomly
and Resnet50) and randomly
flipped horizontally
flipped horizontally totoovercome
overcome overfitting.
overfitting. Dropout
Dropout andand
batchbatch normalization
normalization wereimplemented.
were also also
implemented. Equalization pre-processing was performed prior to input,
Equalization pre-processing was performed prior to input, and normalization was performedand normalization was to set
performed to set the maximum value of the image to 1. Learning sets were randomly
the maximum value of the image to 1. Learning sets were randomly divided into four sets for 4-fold divided into
four sets for 4-fold cross validations, and each success rate was evaluated twice and statistically
cross validations, and each success rate was evaluated twice and statistically processed [28].
processed [28].
We also used the gradient-weighted class activation mapping (Grad-CAM) technique to see where
We also used the gradient-weighted class activation mapping (Grad-CAM) technique to see
the AI was interested [29,30]. By expressing difference in color, it was possible to know which area
where the AI was interested [29,30]. By expressing difference in color, it was possible to know which
received
area receivedthe greatest judgment
the greatest by the
judgment by AI
themodel. By displaying
AI model. By displayingthethe
region of interest
region (ROI),
of interest (ROI),it became
it a
more explainable AI model.
became a more explainable AI model.

3.
3. Results
Results
The averageperformance
The average performance of of Modified-Alexnet
Modified-Alexnet afterafter twoof
two sets sets of 4-fold
4-fold cross validation
cross validation was 97.6%was 97.6%
in
in the training
trainingset,
set,95.6%
95.6%inin the
the validation
validation set,set, 91.9%
91.9% intest
in the theset,
testand
set,96.4%
and 96.4% inFor
in total. total. For MobileNet,
MobileNet,
itit was 98.5%in
was 98.5% inthe
thetraining
training set,
set, 92.7%
92.7% in the
in the validation
validation set, 83.8%
set, 83.8% in thein the
test test
set, andset, andin
95.4% 95.4%
total, in total,
and for
and forResnet50
Resnet50it it was 98.1%
was in the
98.1% in training set, 94.5%
the training in the validation
set, 94.5% set, 83.8%set,
in the validation in the test set,
83.8% and test set,
in the
95.6%
and in total
95.6% (Figure
in total 3).
(Figure 3).

Means
Figure3.3.Means
Figure (a)(a)
andand standard
standard deviations
deviations (b) of(b) of prediction
prediction accuracy
accuracy for thefor themodels.
three three models.

The standarddeviations
The standard deviations ofofthethe success
success raterate for Modified-Alexnet,
for Modified-Alexnet, MobileNet,
MobileNet, and Resnet50
and Resnet50 were were
1.5%,
1.5%, 1.7%,
1.7%, andand2.1%
2.1%ininthethe training
training set;set; 3.5%,
3.5%, 7.6%,7.6%,
and and
6.1%6.1%
in theinvalidation
the validation set;
set; and and2.2%,
1.1%, 1.1%, 2.2%,
and
and 4.7%
4.7% in in the
thetest
testset,
set,respectively.
respectively.
The screening
The screeningperformances
performances of of
thethe
three DCNN
three DCNN models testedtested
models in thisinstudy are displayed
this study in
are displayed in
Table 2. It can be observed that Modified-Alexnet achieved the highest performance,
Table 2. It can be observed that Modified-Alexnet achieved the highest performance, with an accuracy with an accuracy
of 0.919
of 0.919 (95%
(95%CI CI0.888
0.888toto0.949),
0.949),sensitivity
sensitivityof 0.852 (95%(95%
of 0.852 CI 0.811 to 0.893),
CI 0.811 and specificity
to 0.893), of 0.973of 0.973
and specificity
(95% CI 0.956 to 0.991).
(95% CI 0.956 to 0.991).
Table 2. Orthognathic surgery screening performance of three models in this study.
Table 2. Orthognathic surgery screening performance of three models in this study.
AUC Accuracy Sensitivity Specificity
Model Model AUC (95% CI) Accuracy (95% CI) Sensitivity (95% CI) Specificity (95% CI)
(95% CI) (95% CI) (95% CI) (95% CI)
Modified-Alexnet 0.969 (±0.019)
Modified-Alexnet 0.969 (±0.019)0.919 (±0.030)
0.919 (±0.030) 0.8520.852 (±0.041)
(±0.041) 0.973 (±0.017)
0.973 (±0.017)
MobileNet
MobileNet0.908 (±0.032)
0.908 (±0.032)0.838 (±0.429)
0.838 (±0.429) 0.7610.761 (±0.051)
(±0.051) 0.931 (±0.028)
0.931 (±0.028)
Resnet50Resnet500.923 (±0.030)
0.923 (±0.030)0.838 (±0.429)
0.838 (±0.429) 0.7500.750 (±0.052)
(±0.052) 0.944 (±0.025)
0.944 (±0.025)

Figure 44shows
Figure showsthe
thereceiver
receiveroperating
operatingcharacteristic (ROC)
characteristic curves
(ROC) of all of
curves models. The area
all models. under
The area under
the ROC curve (AUC) was also calculated. The Modified-Alexnet model achieved the highest
the ROC curve (AUC) was also calculated. The Modified-Alexnet model achieved the highest AUC of AUC
of 0.969, while Resnet50 achieved an AUC of 0.923 and MobileNet achieved an AUC of 0.908.
0.969, while Resnet50 achieved an AUC of 0.923 and MobileNet achieved an AUC of 0.908.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 12
Appl. Sci. 2020, 10, 2124 6 of 11
Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 12

Figure 4. Mean ROC curves of three deep convolutional neural network (DCNN) models used in
Figure 4. Mean ROC curves of three deep convolutional neural network (DCNN) models used in
Figure 4. Mean ROC curves of three deep convolutional
this study. neural network (DCNN) models used in
this study.
this study.
In cases
In cases of
of successful
successful prediction,
prediction, the
the ROI
ROI identified
identified by
by class
class activation
activation mapping
mapping (CAM)
(CAM) was
was
In cases
mainly of on
focused successful
the prediction,
maxillary and the ROI identified
mandibular teeth, by class activation
mandibular mapping
symphysis, and (CAM) was
mandible. In
mainly focused on the maxillary and mandibular teeth, mandibular symphysis, and mandible. In cases
mainly
cases focused on
of prediction the maxillary
failure, and mandibular
the highlights were turnedteeth, mandibular
elsewhere symphysis, and mandible. In
of prediction failure, the highlights were turned elsewhere or or theROI
the ROIwas
wasscattered
scattered throughout,
throughout,
cases of prediction
indicating that failure,
they were the properly
not highlights were turned
focused (Figureelsewhere
5). or the ROI was scattered throughout,
indicating that they were not properly focused (Figure 5).
indicating that they were not properly focused (Figure 5).

Figure 5. The region of interest of deep learning models using gradient-weighted class activation
Figure 5. The region of interest of deep learning models using gradient-weighted class activation
mapping. (a) Success cases. (b) Failure cases.
Figure 5. The
mapping. region cases.
(a) Success of interest of deep
(b) Failure learning models using gradient-weighted class activation
cases.
mapping. (a) Success cases. (b) Failure cases.
4. Discussion
4. Discussion
Most of the previous AI research related to the diagnosis of orthodontics was conducted by
4. Discussion
Most ofthe
considering thelandmark
previouspoints
AI research related to
and calculating thethe diagnosis ofvalues
measurement orthodontics was method
[23,24]. This conducted by
is not
only Most
considering of the
theby
influenced previous
landmark AI research
points and
the measurement related
calculating
value to the diagnosis
the measurement
that is entered, of
but also has orthodontics
values was conducted
[23,24]. Thisof
the disadvantage method by
is not
introducing
considering
many errors the
only influenced landmark
in the byprocess points and calculating
the measurement
of consideringvalue the is
that
the landmarkmeasurement
entered,
points. butvalues
also [23,24].
Recently, has theThis method
disadvantage
an automated is not
landmark of
only influenced
introducing many by the
errors measurement
in the process value
of that
considering is entered,
the landmarkbut also
points.has the disadvantage
Recently,
detection method has been introduced, but the performance is comparable or slightly less than manual an automated of
introducing
landmarkby
detection many
detection errors
method
specialists in the
[31]. hasprocess of considering
been introduced,
Furthermore, there isbut theperformance
the
another landmark points.
disadvantage ofRecently,
is comparable anslightly
or
overfitting, automated
which canless
landmark
than detection
manual detectionmethod
by has been
specialists introduced,
[31]. but
Furthermore, the performance
there is another is comparable
disadvantage
easily occur when the measurement values have a similar meaning input in the machine-learning or
of slightly less
overfitting,
than
whichmanual detection
can easily occur bywhenspecialists [31]. Furthermore,
the measurement values have there is another
a similar disadvantage
meaning input inoftheoverfitting,
machine-
which can easily occur when the measurement values have a similar meaning input in the machine-
learning model. Even if automated landmark detection is extremely precise, there is a limitation with
the neural network model because selecting the input of the artificial neural networks (ANNs) model
is an ambiguous region. In this study, precision of automated landmark detection was not important
because we used a landmark search only to find the range for cropping the image. A deep learning
Appl. Sci. 2020, 10, 2124 7 of 11
algorithm is an algorithm that extracts features of an image using the convolution filter and pooling
layer and then analyzes patterns using them (Figure 6). Figure 6 shows how the features were
extracted
model. Even afterifthe original images
automated landmark in the top rowispassed
detection the convolutional
extremely precise, therefilters in the left with
is a limitation column.
the
Many
neural deep
network learning
modelmodels
becausehave beenthe
selecting improved
input of and developed
the artificial based
neural on the (ANNs)
networks filter size, type,
model is
location
an ambiguousand combination,
region. In thisandstudy,
various ideas. of automated landmark detection was not important
precision
Thewe
because Alexnet
used aarchitecture was the
landmark search winning
only to findmodel of the
the range forILSVRC
cropping in the
2012, and was
image. a successful
A deep learning
model
algorithmthatisled
an to the development
algorithm of deep
that extracts learning
features of an[25].
imageIt is a basic
using the model of deep
convolution learning
filter with a
and pooling
simple
layer and butthen
powerful
analyzes structure.
patternsUntil
usingrecently, Resnet50
them (Figure has been
6). Figure in the
6 shows howspotlight as a state-of-the-art
the features were extracted
model
after thefororiginal
image classification
images in the problems
top row[17]. MobileNet
passed is a model that
the convolutional canin
filters bethe
trained with relatively
left column. Many
simple calculations
deep learning models to make it easy
have been to use with
improved andmobile phones,
developed andon
based itsthe
complexity
filter size,istype,
halfway between
location and
Alexnet
combination,and Resnet50
and various [32].ideas.

Figure 6. Extracted features after the original images (in the top row) passed the convolutional filters
Figure 6. Extracted features after the original images (in the top row) passed the convolutional filters
(in the left column).
(in the left column).
The Alexnet architecture was the winning model of the ILSVRC in 2012, and was a successful
In this study, Resnet50 and MobileNet showed that the fit of the training set was too fast and the
model that led to the development of deep learning [25]. It is a basic model of deep learning with a
fit of the validation set followed slowly. On the other hand, Modified-Alexnet showed that the fit of
simple but powerful structure. Until recently, Resnet50 has been in the spotlight as a state-of-the-art
model for image classification problems [17]. MobileNet is a model that can be trained with relatively
simple calculations to make it easy to use with mobile phones, and its complexity is halfway between
Alexnet and Resnet50 [32].
Appl. Sci. 2020, 10, 2124 8 of 11

Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 12


In this study, Resnet50 and MobileNet showed that the fit of the training set was too fast and the
fit of the validation
the training set wassetsimilar
followed to slowly.
that of On
thethe other hand,
validation set Modified-Alexnet showed
(Figure 7). This also that
led to the fit of the
differences in
training set was
success rates for similar
the test to
setthat of thethe
between validation set (Figure 7). This also led to differences in success
three models.
rates for the test set between the three models.

Figure 7. Graphs of the performance of the accuracy and the loss function of each model. (a)
Modified-Alexnet,
Figure 7. Graphs of(b)
theMobileNet,
performanceand
of (c)
theResnet50.
accuracy and the loss function of each model. (a) Modified-
Alexnet, (b) MobileNet, and (c) Resnet50.
The ILSVRC is a competition that categorizes the entire image set into 1000 subclasses [17,25].
In this study,
The the cephalometric
ILSVRC is a competition radiographs were classified
that categorizes the entireinto
imagetwosettypes, orthognathic
into 1000 subclasses surgery and
[17,25]. In
orthodontic treatment, without the complexity of 1000 subclasses. Therefore, more
this study, the cephalometric radiographs were classified into two types, orthognathic surgery and complex and deeper
layers, such as
orthodontic Resnet50,
treatment, were overfit
without for a simple
the complexity of problem. Modified-Alexnet,
1000 subclasses. Therefore, moreon thecomplex
other hand,
and
does not have high complexity, but it includes several techniques for deep learning
deeper layers, such as Resnet50, were overfit for a simple problem. Modified-Alexnet, on the other (L2 regularization,
dropout,
hand, does etc.),
notwhich
haveimproves performance
high complexity, but[25].
it includes several techniques for deep learning (L2
In the ILSVRC,
regularization, it is possible
dropout, to learn
etc.), which using several
improves data, but
performance [25].it is not easy to get a sufficiently large
data In
size when there is a limit in the amount of data
the ILSVRC, it is possible to learn using several data, butcollection, such
it isasnot
medical
easy todata.
get aInsufficiently
this study,
up to data
large 2048 size
imageswhen could
therebeisgenerated
a limit in intheone imageofbydata
amount randomly cropping
collection, such as and flipping
medical at aInbasic
data. this
image size of 256 × 256 pixels. This process helped to make the learning
study, up to 2048 images could be generated in one image by randomly cropping and flipping at a in this study meaningful,
even
basic with
imagea size
small ofamount
256 × 256ofpixels.
data. This
This process
study provides
helped tohints
make onthethelearning
strategies to use
in this when
study there is a
meaningful,
limited
even withamount
a small of amount
data. of data. This study provides hints on the strategies to use when there is a
limited amount of data.
If the clinician takes pre-processing steps to obtain the measured values by taking or tracing the
landmark points with the cephalometric radiographs, the precision or measured value of the
Appl. Sci. 2020, 10, 2124 9 of 11

If the clinician takes pre-processing steps to obtain the measured values by taking or tracing
the landmark points with the cephalometric radiographs, the precision or measured value of the
landmark point detection may vary depending on the clinician’s ability. However, diagnosing with the
image itself reduces the chances of such an intervention. Moreover, the information that has not been
considered previously can be considered together, which means that the meaning and effect of deep
learning can be obtained more reliably.
One of the biggest diagnostic differences between indications of orthognathic surgery and
orthodontic treatment is determining whether the skeletal differences between the maxilla and
mandible can be overcome [4]. In this study, Grad-CAM was used to determine whether the deep
learning AI model considered and evaluated the correct region. Grad-CAM is a generalized version of
CAM that can be applied to AI models without global average pooling [29,30]. The highlighted CAM
gives us insight into what it is seeing and evaluating why it failed. This was not different from the real
situation that the actual clinician was seeing and evaluating.
The limitation of this study was that comparison of the performance was limited due to the lack of
data. When assessed with more data, different results can be obtained, which will need to be addressed
in future studies. Additionally, future studies should analyze the impact of better model construction
and their comparison with other models. For example, in the Resnet model, we can compare the
accuracy difference according to the number of blocks. Compared to conventional machine learning,
potential issues that deep learning may have include image quality problems, differences in X-ray
devices, and image distortion caused by disturbing factors such as dental prostheses.
The significance of this study is that it is the first study to perform differential diagnosis of the
indications of orthognathic surgery and orthodontic treatment based on image, not measurements,
and it showed a significant success rate. Compared with previous studies conducted with ANNs,
the diagnostic performance of orthognathic surgery increased from 95.8% to 96.4% in total [33]. Most of
the artificial intelligence research in the field of orthodontics is still limited to ANNs. AI models
have been applied to the determination of extraction or not, extraction patterns, and anchorage
patterns [23,24]. The image feature analysis and classification technique using DCNNs has had an
enormous impact on the entire medical field, and the same is true in the field of orthodontics and
oral surgery.

5. Conclusions
In this study, the DCNN approaches to the cephalometric radiograph image-based differential
diagnosis for indications of orthognathic surgery were successfully applied and showed a high success
rate of 95.4%–96.4%. Visualization with Grad-CAM also showed the potential for descriptive AI and
confirmed the ROI required for differential diagnosis. Differential diagnosis using the whole image
rather than evaluation with a specific measurement is a characteristic part of diagnosis using deep
learning, and it also has the advantage of taking into account subtleties that are not represented by
measurements. Therefore, the results of this study suggest that DCNNs could play an important role
in the differential diagnosis of the indications of orthognathic surgery. Further research will be needed
with the application of various deep learning structures, an appropriate number of datasets, and sets
of images with verified labels.

Author Contributions: Conceptualization, K.-S.L., J.-J.R., H.S.J., D.-Y.L., and S.-K.J.; data curation, K.-S.L. and
S.-K.J.; formal analysis, S.-K.J.; investigation, K.-S.L. and S.-K.J.; methodology, K.-S.L.; project administration,
K.-S.L. and S.-K.J.; software, K.-S.L.; supervision, J.-J.R., H.S.J., and D.-Y.L.; validation, S.-K.J.; visualization, S.-K.J.;
writing (original draft), K.-S.L. and S.-K.J.; writing (review and editing), K.-S.L., J.-J.R., H.S.J., D.-Y.L., and S.-K.J.
All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Acknowledgments: This research was supported by the Basic Science Research Program through the National
Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2019R1I1A1A01062961).
Conflicts of Interest: The authors declare no conflict of interest.
Appl. Sci. 2020, 10, 2124 10 of 11

References
1. Proffit, W.R.; Fields, H.W.; Sarver, D.M. Contemporary Orthodontics, 5th ed.; Mosby: St Louis, MO, USA, 2013.
2. Veiszenbacher, E.; Wang, J.; Davis, M.; Waite, P.D.; Borbely, P.; Kau, C.H. Virtual surgical planning: Balancing
esthetics, practicality, and anticipated stability in a complex Class III patient. Am. J. Orthod. Dentofac. Orthop.
2019, 156, 685–693. [CrossRef] [PubMed]
3. Geramy, A.; Sheikhzadeh, S.; Jalali, Y.F.; Nazarifar, A.M. Anthropometric Facial Changes After Orthognathic
Surgery and Their Relation With Oral Health Related Quality of Life. J. Craniofac. Surg. 2019, 30, 1118–1120.
[CrossRef] [PubMed]
4. Klein, K.P.; Kaban, L.B.; Masoud, M.I. Orthognathic Surgery and Orthodontics: Inadequate Planning Leading
to Complications or Unfavorable Results. Oral Maxillofac. Surg. Clin. N. Am. 2020, 32, 71–82. [CrossRef]
[PubMed]
5. Turpin, D.L. The orthodontic examination. Angle Orthod. 1990, 60, 3–4.
6. Heinz, J.; Stewart, K.; Ghoneima, A. Evaluation of two-dimensional lateral cephalometric radiographs and
three-dimensional cone beam computed tomography superimpositions: A comparative study. Int. J. Oral
Maxillofac. Surg. 2019, 48, 519–525. [CrossRef]
7. Manosudprasit, A.; Haghi, A.; Allareddy, V.; Masoud, M.I. Diagnosis and treatment planning of orthodontic
patients with 3-dimensional dentofacial records. Am. J. Orthod. Dentofac. Orthop. 2017, 151, 1083–1091.
[CrossRef]
8. Yue, W.; Yin, D.; Li, C.; Wang, G.; Xu, T. Automated 2-D cephalometric analysis on X-ray images by a
model-based approach. IEEE. Trans. Biomed. Eng. 2006, 53, 1615–1623.
9. Perillo, M.; Beideman, R.; Shofer, F. Effect of landmark identification on cephalometric measurements:
Guidelines for cephalometric analyses. Clin. Orthod. Res. 2000, 3, 29–36. [CrossRef]
10. Moore, J.W. Variation of the sella-nasion plane and its effect on SNA and SNB. J. Oral Surg. 1976, 34, 24–26.
11. Shin, H.C.; Roth, H.R.; Gao, M.; Lu, L.; Xu, Z.; Nogues, I.; Yao, J.; Mollura, D.; Summers, R.M. Deep
Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics
and Transfer Learning. IEEE Trans. Med. Imaging 2016, 35, 1285–1298. [CrossRef] [PubMed]
12. Wu, H.; Huang, Q.; Wang, D.; Gao, L. A CNN-SVM combined model for pattern recognition of knee motion
using mechanomyography signals. J. Electromyogr. Kinesiol. 2018, 42, 136–142. [CrossRef]
13. Arandjelovic, R.; Gronat, P.; Torii, A.; Pajdla, T.; Sivic, J. NetVLAD: CNN Architecture for Weakly Supervised
Place Recognition. IEEE. Trans. Pattern Anal. Mach. Intell. 2018, 40, 1437–1451. [CrossRef]
14. Zhao, X.; Qi, S.; Zhang, B.; Ma, H.; Qian, W.; Yao, Y.; Sun, J. Deep CNN models for pulmonary nodule
classification: Model modification, model integration, and transfer learning. J. X-ray Sci. Technol. 2019, 27,
615–629. [CrossRef] [PubMed]
15. Spadea, M.F.; Pileggi, G.; Zaffino, P.; Salome, P.; Catana, C.; Izquierdo-Garcia, D.; Amato, F.; Seco, J.
Deep Convolution Neural Network (DCNN) Multiplane Approach to Synthetic CT Generation From MR
images-Application in Brain Proton Therapy. Int. J. Radiat. Oncol. Biol. Phys. 2019, 105, 495–503. [CrossRef]
[PubMed]
16. Shah, N.; Chaudhari, P.; Varghese, K. Runtime Programmable and Memory Bandwidth Optimized
FPGA-Based Coprocessor for Deep Convolutional Neural Network. IEEE. Trans. Neural Netw. Learn Syst.
2018, 29, 5922–5934. [CrossRef] [PubMed]
17. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. arXiv 2015, arXiv:1512.03385.
18. Park, W.J.; Park, J.B. History and application of artificial neural networks in dentistry. Eur. J. Dent. 2018, 12,
594–601. [CrossRef]
19. Miotto, R.; Wang, F.; Wang, S.; Jiang, X.; Dudley, J.T. Deep learning for healthcare: Review, opportunities and
challenges. Brief Bioinform. 2018, 19, 1236–1246. [CrossRef]
20. Lee, J.H.; Kim, D.H.; Jeong, S.N.; Choi, S.H. Detection and diagnosis of dental caries using a deep
learning-based convolutional neural network algorithm. J. Dent. 2018, 77, 106–111. [CrossRef]
21. Wang, C.W.; Huang, C.T.; Hsieh, M.C.; Li, C.H.; Chang, S.W.; Li, W.C.; Vandaele, R.; Maree, R.; Jodogne, S.;
Geurts, P.; et al. Evaluation and Comparison of Anatomical Landmark Detection Methods for Cephalometric
X-Ray Images: A Grand Challenge. IEEE Trans. Med. Imaging 2015, 34, 1890–1900. [CrossRef]
Appl. Sci. 2020, 10, 2124 11 of 11

22. Cheng, E.; Chen, J.; Yang, J.; Deng, H.; Wu, Y.; Megalooikonomou, V.; Gable, B.; Ling, H. Automatic
Dent-landmark detection in 3-D CBCT dental volumes. In Proceedings of the Annual International
Conference of the IEEE Engineering in Medicine and Biology Society, Boston, MA, USA, 30 August–3
September 2011; Volume 2011, pp. 6204–6207.
23. Jung, S.-K.; Kim, T.-W. New approach for the diagnosis of extractions with neural network machine learning.
Am. J. Orthod. Dentofac. Orthop. 2016, 149, 127–133. [CrossRef]
24. Li, P.; Kong, D.; Tang, T.; Su, D.; Yang, P.; Wang, H.; Zhao, Z.; Liu, Y. Orthodontic Treatment Planning based
on Artificial Neural Networks. Sci. Rep. 2019, 9, 2037. [CrossRef] [PubMed]
25. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks.
Adv. Neural Inf. Process. Syst. 2012, 1, 1097–1105. [CrossRef]
26. Alom, M.Z.; Taha, T.M.; Yakopcic, C.; Westberg, S.; Sidike, P.; Nasrin, M.S.; Esesn, B.C.; Awwal, A.A.;
Asari, V.K. The history began from alexnet: A comprehensive survey on deep learning approaches. arXiv
2018, arXiv:1803.01164.
27. Bottou, L. Large-scale machine learning with stochastic gradient descent. In Proceedings of the
COMPSTAT’2010, Paris, France, 22–27 August 2010; pp. 177–186.
28. Jung, Y.; Hu, J. A K-fold Averaging Cross-validation Procedure. J. Nonparametr. Stat. 2015, 27, 167–179.
[CrossRef]
29. Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual explanations
from deep networks via gradient-based localization. arXiv 2016, arXiv:1610.02391.
30. Lee, K.-S.; Jung, S.-K.; Ryu, J.-J.; Shin, S.-W.; Choi, J. Evaluation of Transfer Learning with Deep Convolutional
Neural Networks for Screening Osteoporosis in Dental Panoramic Radiographs. J. Clin. Med. 2020, 9, 392.
[CrossRef]
31. Hwang, H.W.; Park, J.H.; Moon, J.H.; Yu, Y.; Kim, H.; Her, S.B.; Srinivasan, G.; Aljanabi, M.N.A.; Donatelli, R.E.;
Lee, S.J. Automated identification of cephalometric landmarks: Part 2-Might it be better than human?
Angle Orthod. 2020, 90, 69–76. [CrossRef]
32. Michele, A.; Colin, V.; Santika, D.D. Mobilenet convolutional neural networks and support vector machines
for palmprint recognition. Procedia Comput. Sci. 2019, 157, 110–117. [CrossRef]
33. Choi, H.I.; Jung, S.K.; Baek, S.H.; Lim, W.H.; Ahn, S.J.; Yang, I.H.; Kim, T.W. Artificial Intelligent Model With
Neural Network Machine Learning for the Diagnosis of Orthognathic Surgery. J. Craniofac. Surg. 2019, 30,
1986–1989. [CrossRef]

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

You might also like