Professional Documents
Culture Documents
Date of publication xxx 00, 0000, date of current version xxx 00, 0000.
Digital Object Identifier 10.1109/
ABSTRACT Sagittal cervical spine alignment measured on X-Ray is a key objective measure for clinicians
caring for patients with a multitude of presenting symptoms. Despite its applications, there has been no
research available in this field yet. This paper presents a framework for automatic detection of the
Sagittal cervical spine landmark point. Inspired by UNet, we propose an encoder-decoder Convolutional
Neural Network (CNN) called PoseNet. In developing our model, we first review the weaknesses of
widely used regression loss functions such as the L1, and L2 losses. To address these issues, we propose
a novel loss function specifically designed to improve the accuracy of the localization task under
challenging situations (extreme neck pose, low or high brightness and illumination, X-Ray noises, etc.)
We validate our model and loss function on a dataset of X-Ray images. The results show that our
framework is capable of performing precise sagittal cervical spine landmark point detection even for
challenging X-Ray images.
INDEX TERMS Neck Landmark Point Detection, Landmark Point Detection, Intensity Aware Loss, Custom
Loss Function, Medical Image Processing, X-Ray Image Processing, Convolutional Neural Network,
Computer Vision, Deep Learning
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see
https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
10.110 publication. Citation information:
/ACCESS.2022.3180028, IEEE DOI
9 Access
Fardet al.: Sagittal Cervical Spine Landmark Points Detection in X-Ray Using Deep Convolutional Neural Networks
related to neck pain as well as distorted nervous system process for today’s clinician. To expedite this manual
function both sensorimotor and autonomic [14]. process, we have implemented an innovative neural
Consequently, spinal radiography is vital for the clinical network to automatically detect the anatomical location of
evaluation of sagittal cervical spine alignment to address the vertebrae which drastically reduces the clinician’s time
functional losses, pain, and weakness [15]. in mensuration of the sagittal cervical spine.
VOLUME 4, 2016 1
Heatmap regression studied by [16]–[23] is one intensity value of the pixels in this region more
of the most successful and widely used methods accurately than the BG region but not as accurate
among the proposed solutions for landmark point as the CF region. Accordingly, the model can learn
detection tasks. In heatmapbased regression, for the Gaussian distribution of heatmap channels.
each landmark point, we generate a channel (a Although almost all (to the best of our
two-dimensional matrix) using a Gaussian knowledge) of the previous work in landmark point
distribution centered at the location of the detection considered landmark localization as a
landmark point. We can model heatmap-based regression task, our proposed loss function
regression as F(X) = Y , where X ∈ RW×H×3 is an input implicitly considers such task as classification as
colored-image with the width and height of W and well. We propose the categorical loss, called CLoss,
H, respectively. Likewise, Y ∈ RWhm×Hhm×k is the which utilizes the cross-entropy (CE) loss to guide
predicted heatmap which is a kdimensional tensor the model to classify the pixels as CF, BG, or FG.
where the width, height and number of channels More specifically, for each pixel that belongs to FG,
are Whm, Hhm and k (the number of landmark and BG regions, CLoss guides the model to learn
point), respectively. Finding F(), a function that the region they belong to. Consequently, to guide
maps the input image to the set of heatmaps is the the model to focus on predicting the intensity
goal of the landmark localization task. In other value of the pixels located in the CF region more
words, the regression task is to predict the value of accurately, CLoss stops penalizing the model as the
each pixel in each heatmap channel. model correctly classifies the pixels in the BG and
L2 loss is widely used in heatmap-based FG regions. Consequently, CLoss guides the model
regression [19], [22], [24]–[26]. However, as L2 loss to learn the proposed regions rather than the
is insensitive to small errors, it is not the most intensity value of the pixels. After learning the
suitable loss function for heatmapbased regression distribution of the defined regions, the loss
landmark point detection. Thus, we propose a function only penalizes the model for accurately
novel loss function, called ILoss, in which for each predicting the intensity value of the pixels located
heatmap channel we define 3 different regions in the CF region. Such characteristics prevent the
(see Fig. 3): 1- CoreForeground (CF) region, which network to put a huge effort into predicting the
contains the pixels with the highest intensity exact intensity value of the pixels located in the BG
values. This region is considered as the most and FG regions, which can result in a more
crucial region since any inaccuracy in the accurate prediction of the intensity values of the
prediction of the intensity values of the pixels in pixels belonging to the CF region.
this region, will likely result in inaccurately The contributions of our approach are
predicted coordinates. 2- Background (BG) region, summarized as follows:
which is the region containing the pixels with the • We propose ILoss, which spans each
smallest intensity values. Regressing the intensity heatmap channel to the BG, CF and FG
value of the pixels in this region can be taken as a regions based on the intensity value of the
less-important task for the network. In other pixels.
words, since the coordinates of the facial landmark • We propose CLoss, which consider the
points heavily rely on the pixels that are located in landmark localization task as a classification
the CF region, finding the exact intensity value of problem and guides the model to classify the
the pixels in the BG region does not result in a pixels as CF, BG, or FG.
more accurate landmark localization task. 3- • We propose IC-Loss as a combination of ILoss
Foreground (FG) region, a region which can be and
considered as a boundary between the BG and CF
CLoss for localizing the sagittal cervical spine
regions. We penalize the model to predict the
landmark point more accurately.
2 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see
https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
publication.
10.1109 Citation information:IEEE
/ACCESS.2022.3180028, DOI Access
Fard et al. : Sagittal Cervical Spine Landmark Point Detection in X-Ray Using Deep Convolutional Neural Networks
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see
https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
10.110 publication. Citation information:
/ACCESS.2022.3180028, IEEE DOI
9 Access
Fardet al.: Sagittal Cervical Spine Landmark Points Detection in X-Ray Using Deep Convolutional Neural Networks
proximal humerus. A two-stage algorithm [59] introduced Wingloss, a new loss function that
composed of a rigid image-to-image and a is capable of overcoming the widely used L2 loss
deformable surface-to-image registration is in conjunction with a strong data augmentation
proposed by Brehler et al. [46] for detecting a set method as well as a pose-based data balancing
of landmarks in the knee joint. Negrillo et al. [47] (PDB). To ease the parts variations and regresses
proposed a geometrically-based algorithm to the coordinates of different parts, TwoStage Re-
detect the landmarks of the humerus for further initialization Deep Regression MODEL (TSR) [60]
analysis of a reduction of supracondylar fractures. splits the face into several parts. Zhang et al. [61]
Likewise to the mentioned research, we choose to proposed Exemplar-based Cascaded Stacked
design a new loss function that can cope well with Auto-Encoder Network (ECSAN) for face
the challenges of landmark localization task. In alignment, which is utilized to handle partial
addition, in Sec.III we discuss the weakness of the occlusion in the image. To cope with self-
widely used L1, and L2 loss and try to design the occlusions and large face rotations, Valle et al.
IC-Loss to perform the sagittal cervical spine [62] proposed a face alignment algorithm based
localization task more accurately. on a coarse-to-fine cascade of ensembles of
regression trees, which is initialized by robustly
B. LANDMARK POINT DETECTION fitting a 3D face model to the probability maps
In general, there are three main methods for produced by a pre-trained convolutional neural
landmark point detection. We have classified and network (CNN). Fard et al. [63] proposed a
reviewed the previously proposed methods in the lightweight multi-task network for jointly
following. detecting facial landmark points as well as the
Classical models (aka template-based methods) estimation of face pose. In another work, a
are among the first methods that are designed for student-teacher network was proposed by Fard et
landmark point detection specifically face al. [64] for extracting facial landmark points using
alignment. Active Shape Model (ASM) [48] and more accurately using lightweight CNN models.
Active Appearance Model (AAM) [49], [50] which More recently, ACR Loss [65] proposed a loss
utilize Principal Components Analysis (PCA) to function to estimate the level of difficulty in
simplify the problem and learn parametric predicting each landmark point for the network
features of faces to model facial landmarks and hence improve the accuracy of the face
variations in an iterative manner. Further, Martins localization task.
et al. [50] proposed a 2.5D AAM that combines a In general, the Coordinate-based method is the
3D metric Point Distribution Model (PDM) with a most efficient model of landmark localization since
2D appearance model to match a 3D deformable the output of the model is the location of the
face model to 2D images. In addition, Cristinacce landmark points, and no postprocessing is
and Cootes [51] introduced the Constrained Local required. However, the accuracy of this approach
Model (CLM) which models the face shapes with is not very high as we lose much spatial
Procrustes analysis and principal component information.
analysis. Although CLM models and their various Heatmap-based regression models are another
extensions including [52]–[55], are among the widely used method for landmark point detection
most promising methods for face alignment, they tasks. First, the likelihood heatmaps for each
are sensitive to occlusion as well as illumination landmark point are created, and then the network
when detecting landmarks in unseen datasets. is trained to generate the heatmaps for each input
Robust Cascade Pose Regression (RCPR) [56] was image. A two-part network proposed by Yang et al.
introduced to detect occlusions explicitly and use [24], including a supervised transformation to
robust shape-indexed features. Another normalize faces and a stacked hourglass network
computationally lightweight method was Local [40], is designed to predict heatmaps. LAB [16]
Binary Features (LBF) [57]. proposed by Wu, expressed that the facial
Coordinate-based regression models predict boundary line contains valuable information.
the landmark coordinates vector from the input Hence, they utilized boundary lines as the
image directly. Mnemonic Descent Method geometric structure of a face to help facial
(MDM) [58] has utilized a recurrent convolutional landmark detection. Furthermore, for a better
network to detect facial landmarks. Feng et al. initialization to Ensemble of Regression Trees (ERT)
4 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see
https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
publication.
10.1109 Citation information:IEEE
/ACCESS.2022.3180028, DOI Access
Fard et al. : Sagittal Cervical Spine Landmark Point Detection in X-Ray Using Deep Convolutional Neural Networks
regressor, Valle et al. [66] proposed a simple CNN background pixels it is not applicable to heatmap-
to generate heatmaps of landmark locations. As based regression landmark point localization. To
well as that, JMFA [67] leveraged a stacked cope with this issue, Wang et al. [21] proposed
hourglass network for multi-view face alignment, Adaptive Wing Loss, a loss function that can adapt
and it achieved state-of-the-art accuracy and its curvature to different ground truth pixel values.
demonstrated more accuracy than the best three Consequently, it can be sensitive to small errors on
entries of the Menpo Challenge [68]. Moreover, foreground pixels while it is tolerant to small errors
HRNet being introduced by Sun et al. [22] is a high- on background pixels.
resolution network that is applicable in many Contrary to the previously proposed loss
Computer Vision tasks such as facial landmark functions, our proposed IC-Loss is designed to
detection and achieves a reasonable accuracy. firstly spans each heatmap channel to the BG, CF
Besides that, to deal with shape variations in facial and FG considering the intensity value of each
landmark detection, Iranmanesh et al. [20] pixel and penalizing the model accordingly. More
proposed an approach that provides a robust clearly, IC-Loss penalizes the model for accurately
algorithm that aggregates a set of manipulated predicting the intensity value of the pixels located
images to capture robust landmark representation. in the CF region, while it can tolerate more errors
Gaussian heatmap vectors proposed by Xiong et al. when it comes to the pixels located in either of
[19] to be used instead of the widely used the BG or the FG regions.
heatmap for facial landmark point detection. More
recently, [69] proposed a two-paired cascade III. METHODOLOGY
subnetwork to generate heatmaps and accordingly In this section, we first introduce our proposed
the coordinates of the facial landmark point to network architecture, PoseNet. Next, we illustrate
deal with the more challenging faces. how to generate the heatmap attention map.
Since the accuracy of the heatmap-based Then, we explain ILoss, CLoss, and the proposed
landmark localization is more than the Coordinate- IC-Loss.
based method, we decided to follow the former
approach. A. POSENET ARCHITECTURE
Custom Loss function is a remarkable approach Inspired by UNet [28], we propose an Encoder-
for improving the performance of both regression- Decoder network called PoseNet for the sagittal
based and classification-based [70] models. cervical spine landmark point localization task. As
Similarly, Custom loss functions play an important Fig. 1 shows, PoseNet contains 4 main modules
role in improving the accuracy of landmark including Head, Down-Sampling, Keep, and Up-
localization. However, there is only a few research Sampling.
that has proposed and studied the effect of using a The Head module contains 3 sets of 2-
taskrelated loss function in this context. Fard et al. dimensional convolution (Conv2D) layers followed
[63] have proposed ASMNet, a lightweight CNN as by a ReLU and a batchnormalization layer (see Fig.
well as ASMLoss which is designed to guide the 2). Each Conv2D layer has a 3 × 3 kernel size, 64
network toward learning a smoother version of the filters and stride as 2. We design the Head
facial landmark point. KD-Loss proposed by Fard et module to extract the feature from the input
al. [64] is a loss function designed to use the image and downscale it to of the size of the
knowledge gained by two teacher networks to original input image.
improve the accuracy of a student network for face As Fig. 2 shows, each Down-Sampling module
alignment task. GoDP [71] proposed a distance- has 2 sets of 3 × 3 Conv2D layers having the same
aware softmax loss to assign a large penalty on number of filters and stride equals 1, followed by
incorrectly classified positive samples. Then, they a ReLU and a batch-normalization layer. We
gradually reduce the penalty on the misclassified define 2 outputs for the Down-Sampling module
negative samples as the distance from nearby called Skip and Out. Skip is a skipconnection used
positive samples decreases. The Wing loss [59] is a to pass the information and features to the
modified log loss for coordinate-based face corresponding Up-Sampling module. Besides,
alignment which is designed to amplify the Down is followed by a MaxPooling layer (with
influence of small errors. However, since Wing loss pool_size=2), which is used to downscale the
[59] is very sensitive to small errors in the input by the scale of 2. In the DownSampling
VOLUME 4, 2016 5
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see
https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
10.110 publication. Citation information:
/ACCESS.2022.3180028, IEEE DOI
9 Access
Fardet al.: Sagittal Cervical Spine Landmark Points Detection in X-Ray Using Deep Convolutional Neural Networks
modules, we multiply the number of the filter of stride=1 followed by a ReLU and a batch-
the Conv2D layers by a factor of 2 as we normalization layer. Then, a 2 × 2 DeConv2D with
downscale the image. The first Down-Sampling stride=2 is used to upscale the input by the factor
module has 64 filters, the second, third, and of 2. The number of filters we use for both Conv2D
fourth have 128, 256, and 512 filters, respectively. layers and the DeConv2D are the same. While the
In the Keep module, we only use a 3×3 Conv2D first Up-Sampling module has 512 filters, we
layer with 1024 filters and stride as 1, followed by decrease the filters by a factor of 2, so the second,
a ReLU and a batchnormalization layer. Then, we third, and fourth have 256, 128, and 64 filters
have a 2 × 2 Deconvolution (DeConv2D) layer respectively.
having 512 filters with stride equals to
Down- Up-
Head Sampling Sampling Keep B. HEATMAP ATTENTION MAP
Conv2D Conv2D We define a heatmap attention map based on the
ReLU ReLU ReLU ReLU intensity value of the ground-truth heatmap
batch-norm batch-norm batch-norm batch-norm
channels. Heatmap attention maps define the
Conv2D Conv2D Conv2D DeConv2D importance of each pixel in the ground-truth
ReLU ReLU ReLU ReLU heatmap. For an input image Img, we define
batch-norm batch-norm batch-norm batch-norm
H
Max HWhm×Hhm×N and ′
W hm×Hhm×N as the corresponding
Conv2D Pooling DeConv2D
ReLU ReLU Core
batch-norm Down Skip batch-norm Region
Conv2D
Intensity Value
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see
https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
publication.
10.1109 Citation information:IEEE
/ACCESS.2022.3180028, DOI Access
Fard et al. : Sagittal Cervical Spine Landmark Point Detection in X-Ray Using Deep Convolutional Neural Networks
the height and the number of the channels of the more focus on predicting the accurate intensity
heatmaps respectively. For each heatmap value of the pixels which are important for the
channel, we define the BG, FG and CF regions landmark point detection task.
based on the intensity value of the corresponding
pixels. As Fig 3 shows, a pixel pij belongs to BG if C. ILOSS
its intensity value Iij in [0,θBG]. Likewise, pij belongs As Fig.3 shows (also discussed in Sec.I), predicting
to FG if Iij in (θBG,θFG], and to CF in case Iij in the accurate intensity value of the pixels which
(θFG,1]. Accordingly, we define wij, the attention belong to the CF region can heavily affect the final
for pj according accuracy of the landmark localization task.
to Eq. 1: Accordingly, we propose an intensity aware loss
1 ∀pij ∈ BG ∀pij ∈ function called ILoss, which is designed to penalize
the model based on the intensity value of each
FG
wj = pixel in the ground truth heatmap. Our proposed
ωFG ∀pij ∈ CF ILoss consists of 3 different loss functions which
ωCF
are designed with respect to the fact that the
where ωFG, and ωCF are the hyper parameters that
pixels in the CF, and FG, and BG regions contribute
define the magnitude of the attention map for the
to the accuracy of the final landmark localization
BG, FG and CF regions, respectively.
task differently, and accordingly, ILoss tends to
We define each heatmap channel using 2
penalize the model with 3 different loss functions.
dimensional Gaussian function using Eq. 2:
Before introducing each part of the ILoss, we
define 2 different functions called ∆ and IM()
respectively. Firstly, we define ∆m,i,j,k using Eq. 4:
(2)
where P = (px,py) is the location of the landmark ∆m,i,j,k = |Im,i,j,k − Im,i,j,k′ |
point and σ is the standard deviation of the
distribution. Following the typical practices, we where m ∈ M (M is the number of the images in
define σ = 1.5. Since predicting the intensity value the training set) is the index of the mth item in the
of the pixels belonging to CF region accurately is training set, and I and I′ indicate the ground truth
very crucial for generating the coordinates of the and the predicted intensity value of the pixel
landmark point, we define this region to be smaller located at i ∈ Hhm, j ∈ Whm and k ∈ N heatmap
compared to the other regions. Thus, we respectively. As the Eq.4 shows, ∆ is a function
empirically set θFG = 0.9, which means the pixels that shows the difference between the intensity
with intensity values between 0.9 and 1 belong to value of the corresponding pixels in the predicted
this region. Likewise, we set θBG = 0.4. and the ground truth heatmap.
Considering the ratio of the pixels exists in each Moreover, we define the Intensity Map function,
of the CF, FG, regions compared to the pixels in the IM(), using Eq. 5:
BG region in each heatmap channel H, we set the
values of ωCF, and ωFG, respectively, using Eq. 3:
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see
https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
10.110 publication. Citation information:
/ACCESS.2022.3180028, IEEE DOI
9 Access
Fardet al.: Sagittal Cervical Spine Landmark Points Detection in X-Ray Using Deep Convolutional Neural Networks
1) ILoss for the BG Region make the influence of the loss function very low
For the pixels belonging to the BG region, there is while the prediction is considered as good
no need for a highly accurate prediction of the enough. Moreover, we define ϕBG = 0.5 which
A: BG Region B: FG Region C: CF Region
FIGURE 4. The loss function and the gradient of the loss in different regions.
intensity values. means for each pixel in the BG region, the
Thus, we define a threshold and consider the prediction is accurate if the error is less than 0.5.
prediction as accurate enough if the prediction As Fig. 4-A shows, we define LossBG such that
error is smaller than the threshold. The intensity the influence of the loss decreases as the
value of each pixel in a heatmap channel H can magnitude of the loss decreases. This
vary between 0, and 1, so we consider the characteristic of LossBG guides the model to put
prediction as accurate enough if ∆ ≤ ϕBG. The more focus on the prediction of the intensity
influence of the loss function should be high if the value of the pixels belonging to the FG, and CF
prediction error is more than the defined regions as soon as the model predict the intensity
threshold, ϕBG, while it should drop dramatically value of the pixels in BG accurately enough.
as soon as the prediction error becomes smaller
than the threshold. Having said that, ILoss can be 2) ILoss for the FG Region
a quadratic (y = αx2) function in the BG region. Compared to the BG region, it is much more
Consequently, we define LossBG as a pieces-wise important that the model to predict the intensity
function using Equations 6 and 7: value of the pixels belonging to the FG region
if: ∆m,i,j,k > ϕBG since learning the Gaussian distribution of a
LBG heatmap channel can further help the model to
1 otherwise predict the intensity value of the pixels in the CF
(6) region. The same as the proposed loss function
for the BG region, we define the error prediction
LossBG = threshold as ∆ ≤ ϕFG. The influence of the loss
LBGm,i,j,k function should be high if the prediction error is
greater than the threshold, so we define the loss
as a quadratic function for this condition. Then,
when the prediction error is smaller than the
M N Whm Hhm threshold, we define the loss function as a linear
where C1 is a constant used to smoothly connect 2 function. Contrary to LossBG, the gradient of the
pieces of LossBG together, and α is a linear part of LossFG is a constant value, and
hyperparameter that can set the magnitude and accordingly it penalizes the model independently
the influence of the loss function. We empirically from the magnitude of the prediction error. This
define α = 0.5 (and accordingly which characteristic of LossFG guides the model to
predict the intensity value of the pixels belonging
8 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see
https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
publication.
10.1109 Citation information:IEEE
/ACCESS.2022.3180028, DOI Access
Fard et al. : Sagittal Cervical Spine Landmark Point Detection in X-Ray Using Deep Convolutional Neural Networks
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see
https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
10.110 publication. Citation information:
/ACCESS.2022.3180028, IEEE DOI
9 Access
Fardet al.: Sagittal Cervical Spine Landmark Points Detection in X-Ray Using Deep Convolutional Neural Networks
LC LCE LCF
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see
https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
publication.
10.1109 Citation information:IEEE
/ACCESS.2022.3180028, DOI Access
Fard et al. : Sagittal Cervical Spine Landmark Point Detection in X-Ray Using Deep Convolutional Neural Networks
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see
https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
10.110 publication. Citation information:
/ACCESS.2022.3180028, IEEE DOI
9 Access
Fardet al.: Sagittal Cervical Spine Landmark Points Detection in X-Ray Using Deep Convolutional Neural Networks
TABLE 4. X-Ray age per view type. All numbers are in % format.
white-foreground white-background
1) Preprocessing
Since the sagittal cervical spine X-Ray images in
our dataset have been captured with different
devices and technologies, there exists 2 different
types of X-Rays in the dataset: whiteforeground
images and white-background (see Fig. 6), while
we consider the bones as the foreground and
anything else as the background. We introduce
Original Flipped
the white-foreground X-Rays as the images in
which the intensity value of the pixels in the FIGURE 7. To deal with the huge diversity of the images in the dataset,
foreground segments are greater than the other we proposed a novel preprocessing method. We stack the equalized, the
gray-scale, and the inverted version of the input image together with 2
segments, while in the white-background X-Rays, different orders and save both the image and the flipped version of it.
the intensity value of the pixels in the bone
segments is less than the other segments. As the
discrepancy in types of X-Ray images can impact channels. Then, we normalize the generated gray-
the accuracy of the model negatively, we propose scale image such that the intensity value of each
a preprocessing algorithm to deal with this issue pixel be ∈ [0,1] and call the output img gs. Next, to
and eventually improve the accuracy of the improve the quality of the input X-Ray image, we
model. use the histogram-equalization technique and
As Fig. 7 shows, we first convert each input X- create the equalized version of the image from
Ray image to a gray-scale image by calculating the imggs, and call it imgeq. After that, we create the
mean of the RGB inverted version of imggs and call it imginv.
We create the output image called imgout, a 3-
TABLE 2. X-Ray date range per view type. All numbers are in % format. channel image generated by stacking img gs, imgeq,
Range LC LCE LCF Total and imginv in two different orders to make the
2008-2010 4.46 0.97 0.97 6.40 model learn to deal with both white-foreground,
2010-2014 18.65 2.45 2.41 23.51 and white-background X-Ray images. In the first
2010-2018 29.91 3.28 2.90 36.08
order, we stack imggs, imgeq, and imginv. In the
2018-2021 27.24 3.52 3.24 34
second order, we flip the images horizontally and
TABLE 3. X-Ray gender per view type. All numbers are in % format. then stack imggs, imginv, and imgeq (See Fig. 7). For
the training sample, we keep both of the
Gender LC LCE LCF Total
generated images, while for the test set, we
Male 31.23 3.27 3.56 38.52
randomly select only one.
Female 49.03 6.49 5.96 61.48
12 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see
https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
publication.
10.1109 Citation information:IEEE
/ACCESS.2022.3180028, DOI Access
Fard et al. : Sagittal Cervical Spine Landmark Point Detection in X-Ray Using Deep Convolutional Neural Networks
Using this preprocessing method, the model will normalizing factor is selected as the distance
be capable of dealing with both white-foreground between the pupils of the human eyes (or the
and white-background X-Ray images. The number distance between the outer corner of the eyes),
of whitebackground X-Ray images is lower we choose d as the distance between the
compared to the whitebackground X-Ray images. landmark point number 0 and 4 (see Fig 11). In
Hence, training the model without using the addition, we calculate NME for each vertebra to
proposed preprocessing method, the accuracy of better analyze the accuracy of the model.
the sagittal cervical spine landmark point detection Furthermore, we calculate failure rate (FR),
on the white-background X-Ray images is very low. defined as the proportion of failed detected
sagittal cervical spines, for a maximum error of
B. IMPLEMENTATION DETAILS 0.08 and 0.1. Cumulative Errors Distribution (CED)
Since the width and the height of the input X-Ray curve and the areaunder-the-curve (AUC) [73] are
images are large, we crop each image with respect reported as well.
to the minimum and maximum coordination of the
landmarks points in both X and Y axes. Then, we D. RESULTS ANALYSIS
resize each X-Ray image to have width and height In this section, we first evaluate the accuracy of
of 128, and 256 respectively. We augmented the our proposed network architecture, PoseNet.
XRay images in terms of rotation (from -45 to 45 Then, we evaluate the accuracy of our proposed
degrees), and contrast, brightness and color loss function. Moreover, we study the
modification to add robustness to the model. performance of our proposed PoseNet.
Furthermore, since generating heatmaps
having the same width and height as the input 1) Evaluation of PoseNet
cropped images is memory and processor- Since this is the first work in the field, we first
consuming, we created our heatmaps to be four compare the accuracy of our PoseNet trained
times smaller than the cropped input images, with the widely used L2 loss with 2 very popular
having width and height equal to 32 and 64 baseline models, MobileNetV2 [74] called mnv2,
respectively. We used the Adam optimizer [72] and ResNet50 [75], called res50. Moreover,
TABLE 5. Comparison of the NME (in %), the FR (in %), and the AUC of
for training the networks while choosing the different models versus PoseNet. All the models are trained with L2 loss.
learning rate 10−2, β1 = 0.9, β2 = 0.999, and decay Model NME(↓) FR(↓) AUC(↑)
= 10−5. We then trained the networks for about
150 epochs with a batch size of 60. We LC LCE LCF LC LCE LCF LC LCE LCF
implemented our networks using the TensorFlow mnv2 6.47 7.35 11.04 7.49 9.16 41.96 0.6100 0.5270 0.2069
res50 6.12 6.88 10.75 7.02 8.41 39.95 0.6375 57.21 0.2192
library and ran them on an NVidia 1080Ti GPU.
mnv2-hm 5.60 6.42 10.56 3.84 6.66 36.60 0.6971 0.6227 0.2490
res50-hm 4.79 5.20 7.68 2.92 2.5 16.07 0.7375 0.7527 0.4862
C. EVALUATION METRICS PoseNet 4.75 5.21 7.48 2.77 3.33 9.82 0.7602 0.7385 0.5067
We follow landmark localization tasks and employ since PoseNet is a heatmap-based regression, for
normalized mean error (NME), in Eq.18, to better comparison we create 2 encoder-decoder
measure our model’s accuracy. models using MobileNetV2 [74], and ResNet50
[75] as the encoders respectively, and introduce 3
sets of DeConv2D layers, with filters=256,
kernel=3, and stride=2, followed by a ReLU and a
batch-normalization layer, as the decoder. We call
where m is the number of samples in the test set,
the former model mnv2-hm, and the latter model
and n is number of the landmark point, Pxij, Pyij
res50-hm.
are the predicted points (jth landmark point of the
As Table 5 shows, the accuracy of the sagittal
ith sample) while Gxij, Gyij are the corresponding
cervical spine landmark point detection using
Ground-truths. We also define d as the
heatmap-based regression models is much better
normalizing factor for presenting the results in a
than the coordinate-based regression models.
unified manner (the accuracy of the model will
Moreover, while the NME for the mnv2hm is
not be affected by the size of the image, the
5.60%, 6.42%, and 10.56% for the LC, LCE, and LCF
cropping margin, etc.). Inspired by previously
respectively, these values are reduced to 4.79%
proposed work in facial alignment where the
(about
VOLUME 4, 2016 13
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see
https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
10.110 publication. Citation information:
/ACCESS.2022.3180028, IEEE DOI
9 Access
Fardet al.: Sagittal Cervical Spine Landmark Points Detection in X-Ray Using Deep Convolutional Neural Networks
0.81% reduction), 5.20% (about 1.22% reduction), model and train the model using L1, L2, and IC-
and 7.68% (about 2.88% reduction) respectively Loss. As Fig 8 shows, the performance of the
using resn50-hm as the model. Using the PoseNet, landmark localization is much better when we
the NME reduces to 4.75%, 5.21%, and 7.48% train the model using IC-Loss compared to L1, and
indicating about 0.85%, 1.21%, 3.08% compared to L2 loss.
mnv2-hm for the LC, LCE, and LCF respectively. The
accuracy of PoseNet is better compared to E. ABLATION STUDY
resn50hm both in LC, and LCF subsets, while it is In this section, we first study the effect of the
slightly worse for the LCE subset. However, as hyperparameters defined in Equations 10 and 11.
Table 6 shows, the number of the model Then, we conduct experiments to measure how
parameters and the number of the floating-point positively the proposed CLoss can improve the
operations (FLOPs) of PoseNet is much smaller sagittal cervical spine landmark point detection
than that of resn50-hm. task.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see
https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
publication.
10.1109 Citation information:IEEE
/ACCESS.2022.3180028, DOI Access
Fard et al. : Sagittal Cervical Spine Landmark Point Detection in X-Ray Using Deep Convolutional Neural Networks
Categorical Loss. In the second experiment, the accurate values for the pixels in CF are vital for
called the ultimate goal of the model which is the
FIGURE 9. The NME (in %) of the PoseNet using different values for the localization of the sagittal cervical spine landmark
hyper parameter β.
point.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see
https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
10.110 publication. Citation information:
/ACCESS.2022.3180028, IEEE DOI
9 Access
Fardet al.: Sagittal Cervical Spine Landmark Points Detection in X-Ray Using Deep Convolutional Neural Networks
16 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see
https://creativecommons.org/licenses/by/4.0/
FIGURE 11. Examples of the sagittal cervical spine landmark point detection using PoseNet trained with IC-Loss.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
publication.
10.1109 Citation information:IEEE
/ACCESS.2022.3180028, DOI Access
Fard et al. : Sagittal Cervical Spine Landmark Point Detection in X-Ray Using Deep Convolutional Neural Networks
[5] H. Y. Seong, M. K. Lee, S. R. Jeon, S. W. Roh, S. C. Rhim, and J. Conference on Applications of Computer Vision, 2020, pp.
H. Park, “Prognostic factor analysis for management of 330–340.
chronic neck pain: can we predict the severity of neck pain [21] X. Wang, L. Bo, and L. Fuxin, “Adaptive wing loss for robust
with lateral cervical curvature?” Journal of Korean face alignment via heatmap regression,” in Proceedings of the
Neurosurgical Society, vol. 60, no. 4, p. 456, 2017. IEEE International Conference on Computer Vision, 2019, pp.
[6] K. Han, C. Lu, J. Li, G.-Z. Xiong, B. Wang, G.-H. Lv, and Y.-W. 6971–6981.
Deng, “Surgical treatment of cervical kyphosis,” European [22] K. Sun, Y. Zhao, B. Jiang, T. Cheng, B. Xiao, D. Liu, Y. Mu, X.
spine journal, vol. 20, no. 4, pp. 523–536, 2011. Wang, W. Liu, and J. Wang, “High-resolution representations
[7] C. Fernandez-de Las-Penas, C. Alonso-Blanco, M. Cuadrado, for labeling pixels and regions,” arXiv preprint
and J. Pareja, “Forward head posture and neck mobility in arXiv:1904.04514, 2019.
chronic tensiontype headache: a blinded, controlled study,” [23] J. Su, Z. Wang, C. Liao, and H. Ling, “Efficient and accurate face
Cephalalgia, vol. 26, no. 3, pp. 314–319, 2006. alignment by global regression and cascaded local
[8] A. Nagasawa, T. Sakakibara, and A. Takahashi, refinement,” in Proceedings of the IEEE Conference on
“Roentgenographic findings of the cervical spine in tension- Computer Vision and Pattern Recognition Workshops, 2019,
type headache,” Headache: The Journal of Head and Face pp. 0–0.
Pain, vol. 33, no. 2, pp. 90–95, 1993. [24] J. Yang, Q. Liu, and K. Zhang, “Stacked hourglass network for
[9] M. M. BRAAF and S. ROSNER, “Trauma of cervical spine as robust facial landmark localisation,” in Proceedings of the IEEE
cause of chronic headache,” Journal of Trauma and Acute Conference on Computer Vision and Pattern Recognition
Care Surgery, vol. 15, no. 5, pp. 441–446, 1975. Workshops, 2017, pp. 79–87.
[10] H. Vernon, I. Steiman, and C. Hagino, “Cervicogenic [25] Z. Ruan, C. Zou, L. Wu, G. Wu, and L. Wang, “Sadrnet: Self-
dysfunction in muscle contraction headache and migraine: a aligned dual face regression networks for robust 3d dense
descriptive study.” Journal of manipulative and physiological face alignment and reconstruction,” IEEE Transactions on
therapeutics, vol. 15, no. 7, pp. 418– 429, 1992. Image Processing, 2021.
[11] G. N. Ferracini, T. C. Chaves, F. Dach, D. Bevilaqua-Grossi, C. [26] Z. Shao, Z. Liu, J. Cai, and L. Ma, “Jaa-net: joint facial action
Fernándezde Las-Peñas, and J. G. Speciali, “Analysis of the unit detection and face alignment via adaptive attention,”
cranio-cervical curvatures in subjects with migraine with and International Journal of Computer Vision, vol. 129, no. 2, pp.
without neck pain,” Physiotherapy, vol. 103, no. 4, pp. 392– 321–340, 2021.
399, 2017. [27] H. Lee, M. Park, and J. Kim, “Cephalometric landmark
[12] T. J. Buell, A. L. Buchholz, J. C. Quinn, C. I. Shaffrey, and J. S. detection in dental x-ray images using convolutional neural
Smith, “Importance of sagittal alignment of the cervical spine networks,” in Medical Imaging 2017: Computer-Aided
in the management of degenerative cervical myelopathy,” Diagnosis, vol. 10134. International Society for Optics and
Neurosurgery Clinics of North America, vol. 29, no. 1, pp. 69– Photonics, 2017, p. 101341W.
82, 2018. [28] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional
[13] M. F. Shamji, C. P. Ames, J. S. Smith, J. M. Rhee, J. R. networks for biomedical image segmentation,” in
Chapman, and M. G. Fehlings, “Myelopathy and spinal International Conference on Medical image computing and
deformity: relevance of spinal alignment in planning surgical computer-assisted intervention. Springer, 2015, pp. 234–241.
intervention for degenerative cervical myelopathy,” Spine, [29] Z. Zhong, J. Li, Z. Zhang, Z. Jiao, and X. Gao, “An attention-
vol. 38, no. 22S, pp. S147–S148, 2013. guided deep regression model for landmark detection in
[14] I. M. Moustafa, A. Youssef, A. Ahbouch, M. Tamim, and D. E. cephalograms,” in International Conference on Medical Image
Harrison, “Is forward head posture relevant to autonomic Computing and Computer-Assisted Intervention. Springer,
nervous system function and cervical sensorimotor control? 2019, pp. 540–548.
cross sectional study,” Gait & posture, vol. 77, pp. 29–35, [30] J. Qian, M. Cheng, Y. Tao, J. Lin, and H. Lin, “Cephanet: An
2020. improved faster r-cnn for cephalometric landmark detection,”
[15] P. A. Oakley, J. M. Cuttler, and D. E. Harrison, “X-ray imaging is in 2019 IEEE 16th International Symposium on Biomedical
essential for contemporary chiropractic and manual therapy Imaging (ISBI 2019). IEEE, 2019, pp. 868–871.
spinal rehabilitation: radiography increases benefits and [31] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: towards
reduces risks,” Dose-Response, vol. 16, no. 2, p. real-time object detection with region proposal networks,”
1559325818781437, 2018. IEEE transactions on pattern analysis and machine
[16] W. Wu, C. Qian, S. Yang, Q. Wang, Y. Cai, and Q. Zhou, “Look intelligence, vol. 39, no. 6, pp. 1137–1149, 2016.
at boundary: A boundary-aware face alignment algorithm,” in [32] C. Leibig, V. Allken, M. S. Ayhan, P. Berens, and S. Wahl,
Proceedings of the IEEE Conference on Computer Vision and “Leveraging uncertainty information from deep neural
Pattern Recognition, 2018, pp. 2129–2138. networks for disease detection,” Scientific reports, vol. 7, no.
[17] Z. Tong and J. Zhou, “Face alignment using two-stage 1, pp. 1–14, 2017.
cascaded pose regression and mirror error correction,” [33] J.-H. Lee, H.-J. Yu, M.-j. Kim, J.-W. Kim, and J. Choi,
Pattern Recognition, p. 107866, 2021. “Automated cephalometric landmark detection with
[18] Y. Ge, J. Zhang, C. Peng, M. Chen, J. Xie, and D. Yang, “Deep confidence regions using bayesian convolutional neural
shape constrained network for robust face alignment,” networks,” BMC oral health, vol. 20, no. 1, pp. 1–10, 2020.
Pattern Recognition Letters, vol. 138, pp. 587–593, 2020. [34] X. Dai, H. Zhao, T. Liu, D. Cao, and L. Xie, “Locating anatomical
[19] Y. Xiong, Z. Zhou, Y. Dou, and Z. Su, “Gaussian vector: An landmarks on 2d lateral cephalograms through adversarial
efficient solution for facial landmark detection,” in encoder-decoder networks,” IEEE Access, vol. 7, pp. 132738–
Proceedings of the Asian Conference on Computer Vision, 132747, 2019.
2020. [35] M. Zeng, Z. Yan, S. Liu, Y. Zhou, and L. Qiu, “Cascaded
[20] S. M. Iranmanesh, A. Dabouei, S. Soleymani, H. Kazemi, and N. convolutional networks for automatic cephalometric
Nasrabadi, “Robust facial landmark detection via aggregation landmark detection,” Medical Image Analysis, vol. 68, p.
on geometrically manipulated faces,” in The IEEE Winter 101904, 2021.
VOLUME 4, 2016 17
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see
https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
10.110 publication. Citation information:
/ACCESS.2022.3180028, IEEE DOI
9 Access
Fardet al.: Sagittal Cervical Spine Landmark Points Detection in X-Ray Using Deep Convolutional Neural Networks
[36] J. Zhang, M. Liu, L. Wang, S. Chen, P. Yuan, J. Li, S. G.-F. Shen, [52] A. Asthana, S. Zafeiriou, S. Cheng, and M. Pantic, “Robust
Z. Tang, K.-C. Chen, J. J. Xia et al., “Context-guided fully discriminative response map fitting with constrained local
convolutional networks for joint craniomaxillofacial bone models,” in Proceedings of the IEEE conference on computer
segmentation and landmark digitization,” Medical image vision and pattern recognition, 2013, pp.
analysis, vol. 60, p. 101621, 2020. 3444–3451.
[37] Z. Zhong, J. Li, Z. Zhang, Z. Jiao, and X. Gao, “A coarse-to-fine [53] T. Baltrušaitis, P. Robinson, and L.-P. Morency, “3d
deep heatmap regression method for adolescent idiopathic constrained local model for rigid and non-rigid facial
scoliosis assessment,” in International Workshop and tracking,” in 2012 IEEE conference on computer vision and
Challenge on Computational Methods and Clinical pattern recognition. IEEE, 2012, pp. 2610–2617.
Applications for Spine Imaging. Springer, 2019, pp. 101–106. [54] J. M. Saragih, S. Lucey, and J. F. Cohn, “Deformable model
[38] T. Ma, A. Gupta, and M. R. Sabuncu, “Volumetric landmark fitting by regularized landmark mean-shift,” International
detection with a multi-scale shift equivariant neural network,” journal of computer vision, vol. 91, no. 2, pp. 200–215, 2011.
in 2020 IEEE 17th International Symposium on Biomedical [55] Y. Wang, S. Lucey, and J. F. Cohn, “Enforcing convexity for
Imaging (ISBI). IEEE, 2020, pp. 981–985. improved alignment with constrained local models,” in 2008
[39] B. Chen, Q. Xu, L. Wang, S. Leung, J. Chung, and S. Li, “An IEEE Conference on Computer Vision and Pattern Recognition.
automated and accurate spine curve analysis system,” IEEE IEEE, 2008, pp. 1–8.
Access, vol. 7, pp. 124596–124605, 2019. [56] X. P. Burgos-Artizzu, P. Perona, and P. Dollár, “Robust face
[40] A. Newell, K. Yang, and J. Deng, “Stacked hourglass networks landmark estimation under occlusion,” in Proceedings of the
for human pose estimation,” in European conference on IEEE International Conference on Computer Vision, 2013, pp.
computer vision. Springer, 2016, pp. 483–499. 1513–1520.
[41] J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. [57] S. Ren, X. Cao, Y. Wei, and J. Sun, “Face alignment at 3000 fps
Mu, M. Tan, X. Wang et al., “Deep high-resolution via regressing local binary features,” in Proceedings of the
representation learning for visual recognition,” IEEE IEEE Conference on Computer Vision and Pattern Recognition,
transactions on pattern analysis and machine intelligence, vol. 2014, pp. 1685–1692.
43, no. 10, pp. 3349–3364, 2020. [58] G. Trigeorgis, P. Snape, M. A. Nicolaou, E. Antonakos, and S.
[42] Y. Huang, F. Fan, C. Syben, P. Roser, L. Mill, and A. Maier, Zafeiriou, “Mnemonic descent method: A recurrent process
“Cephalogram synthesis and landmark detection in dental applied for end-to-end face alignment,” in Proceedings of the
cone-beam ct systems,” Medical Image Analysis, vol. 70, p. IEEE Conference on Computer Vision and Pattern Recognition,
102028, 2021. 2016, pp. 4177–4187.
[43] D. Zhang, J. Wang, J. H. Noble, and B. M. Dawant, [59] Z.-H. Feng, J. Kittler, M. Awais, P. Huber, and X.-J. Wu, “Wing
“Headlocnet: loss for robust facial landmark localisation with convolutional
Deep convolutional neural networks for accurate neural networks,” in Proceedings of the IEEE Conference on
classification and multilandmark localization of head cts,” Computer Vision and Pattern Recognition, 2018, pp. 2235–
Medical image analysis, vol. 61, p. 101659, 2020. 2245.
[44] M. Urschler, T. Ebner, and D. Štern, “Integrating geometric [60] J. Lv, X. Shao, J. Xing, C. Cheng, and X. Zhou, “A deep
configuration and appearance information into a unified regression architecture with two-stage re-initialization for
framework for anatomical landmark localization,” Medical high performance facial landmark detection,” in Proceedings
image analysis, vol. 43, pp. 23–36, 2018. of the IEEE Conference on Computer Vision and Pattern
[45] S. Poltaretskyi, J. Chaoui, M. Mayya, C. Hamitouche, M. Recognition, 2017, pp. 3317–3326.
Bercik, P. Boileau, and G. Walch, “Prediction of the pre- [61] J. Zhang and H. Hu, “Exemplar-based cascaded stacked auto-
morbid 3d anatomy of the proximal humerus based on encoder networks for robust face alignment,” Computer
statistical shape modelling,” The bone & joint journal, vol. 99, Vision and Image Understanding, vol. 171, pp. 95–103, 2018.
no. 7, pp. 927–933, 2017. [62] R. Valle, J. M. Buenaposada, A. Valdés, and L. Baumela, “Face
[46] M. Brehler, G. Thawait, J. Kaplan, J. Ramsay, M. J. Tanaka, S. alignment using a 3d deeply-initialized ensemble of regression
Demehri, J. H. Siewerdsen, and W. Zbijewski, “Atlas-based trees,” Computer Vision and Image Understanding, vol. 189,
algorithm for automatic anatomical measurements in the p. 102846, 2019.
knee,” Journal of Medical Imaging, vol. 6, no. 2, p. 026002, [63] A. P. Fard, H. Abdollahi, and M. Mahoor, “Asmnet: A
2019. lightweight deep neural network for face alignment and pose
[47] J. Negrillo-Cárdenas, J.-R. Jiménez-Pérez, H. Cañada-Oya, F. R. estimation,” in Proceedings of the IEEE/CVF Conference on
Feito, and A. D. Delgado-Martínez, “Automatic detection of Computer Vision and Pattern Recognition, 2021, pp. 1521–
landmarks for the analysis of a reduction of supracondylar 1530.
fractures of the humerus,” Medical Image Analysis, vol. 64, p. [64] A. P. Fard and M. H. Mahoor, “Facial landmark points
101729, 2020. detection using knowledge distillation-based neural
[48] T. Cootes, E. Baldock, and J. Graham, “An introduction to networks,” Computer Vision and Image Understanding, p.
active shape models,” Image processing and analysis, pp. 103316, 2021.
223–248, 2000. [65] A. P. Fard and M. Mahoor, “Acr loss: Adaptive coordinate-
[49] T. F. Cootes, G. J. Edwards, and C. J. Taylor, “Active based regression loss for face alignment,” arXiv preprint
appearance models,” in European conference on computer arXiv:2203.15835, 2022.
vision. Springer, 1998, pp. 484–498. [66] R. Valle, J. M. Buenaposada, A. Valdés, and L. Baumela, “A
[50] P. Martins, R. Caseiro, and J. Batista, “Generative face deeplyinitialized coarse-to-fine ensemble of regression trees
alignment through 2.5 d active appearance models,” for face alignment,” in Proceedings of the European
Computer Vision and Image Understanding, vol. 117, no. 3, Conference on Computer Vision (ECCV), 2018, pp. 585–601.
pp. 250–268, 2013. [67] J. Deng, G. Trigeorgis, Y. Zhou, and S. Zafeiriou, “Joint multi-
[51] D. Cristinacce and T. F. Cootes, “Feature detection and view face alignment in the wild,” IEEE Transactions on Image
tracking with constrained local models.” in Bmvc, vol. 1, no. 2. Processing, vol. 28, no. 7, pp. 3636–3648, 2019.
Citeseer, 2006, p. 3.
18 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see
https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
publication.
10.1109 Citation information:IEEE
/ACCESS.2022.3180028, DOI Access
Fard et al. : Sagittal Cervical Spine Landmark Point Detection in X-Ray Using Deep Convolutional Neural Networks
[68] S. Zafeiriou, G. Trigeorgis, G. Chrysos, J. Deng, and J. Shen, ANNE-LISE DUPUIS received her
“The menpo facial landmark localisation challenge: A step Bachelor of Science degree in
towards the solution,” in Proceedings of the IEEE Conference Computing Science from the
on Computer Vision and Pattern Recognition Workshops, University of Alberta in 1980. She was
2017, pp. 170–179. awarded a substantial scholarship
[69] S. Mahpod, R. Das, E. Maiorana, Y. Keller, and P. Campisi, from the National Research Council
“Facial landmarks localization using cascaded neural of Canada in 1980 towards a master’s
networks,” Computer Vision and Image Understanding, p. degree which she declined after
103171, 2021. working as a computer consultant for
[70] A. P. Fard and M. H. Mahoor, “Ad-corre: Adaptive correlation- large firms. Having worked in various scientific, process
based loss for facial expression recognition in the wild,” IEEE control, business, and computer environments as a Project
Access, vol. 10, pp. 26756–26768, 2022. leader, Database Administrator, Systems Analyst,
[71] Y. Wu, S. K. Shah, and I. A. Kakadiaris, “Godp: Globally and Programmer, Anne-Lise has a rich background to draw
optimized dual pathway deep network architecture for facial upon. Currently, she resides as the lead programming
landmark localization in-thewild,” Image and Vision architect for application development at PostureCo.
Computing, vol. 73, pp. 1–16, 2018.
[72] D. P. Kingma and J. Ba, “Adam: A method for stochastic
optimization,” arXiv preprint arXiv:1412.6980, 2014.
[73] H. Yang, X. Jia, C. C. Loy, and P. Robinson, “An empirical study
of recent face alignment methods,” arXiv preprint
arXiv:1511.05049, 2015. MOHAMMAD H. MAHOOR received
[74] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, the MS degree in Biomedical
“Mobilenetv2: Inverted residuals and linear bottlenecks,” in Engineering from Sharif University of
Proceedings of the IEEE Conference on Computer Vision and Technology, Iran, in 1998, and the
Pattern Recognition, 2018, pp. 4510–4520. Ph.D. degree in Electrical and
[75] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for Computer Engineering from the
image recognition,” in Proceedings of the IEEE conference on University of Miami, Florida, in 2007.
computer vision and pattern recognition, 2016, pp. 770–778. Currently, he is a professor of
ALI POURRAMEZAN FARD received Electrical and Computer Engineering
the MSc degree in Computer at the University of Denver. He does
Engineering - from Iran University of research in the area of computer vision and machine learning
Science and Technology, Tehran, Iran, including visual object recognition, object tracking, affective
in 2015. He is currently pursuing his computing,
Ph.D. degree in Electrical & Computer and human-robot interaction (HRI) such as humanoid social
engineering and is a graduate robots for interaction and intervention of children with autism
teaching assistant in the Department and older adults with depression and dementia. He has
of Electrical and Computer received over $7M in research funding from state and federal
Engineering at the University of agencies including the National Science Foundation and the
Denver. His research interests include Computer Vision, National Institute of Health. He is a Senior Member of IEEE
Machine Learning, and Deep and has published over 158 conference and journal papers.
Neural Networks, especially in face alignment,
and facial expression analysis.
VOLUME 4, 2016 19
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see
https://creativecommons.org/licenses/by/4.0/