You are on page 1of 10

Computerized Medical Imaging and Graphics 95 (2022) 102022

Contents lists available at ScienceDirect

Computerized Medical Imaging and Graphics


journal homepage: www.elsevier.com/locate/compmedimag

A relation-based framework for effective teeth recognition on dental


periapical X-rays
Kailai Zhang a, *, Hu Chen b, Peijun Lyu b, Ji Wu a
a
Department of Electronic Engineering, Tsinghua University, Beijing, China
b
Center of Digital Dentistry, Peking University School and Hospital of Stomatology & National Engineering Laboratory for Digital and Material Technology of
Stomatology, Beijing, China

A R T I C L E I N F O A B S T R A C T

Keywords: Dental periapical X-rays are used as a popular tool by dentists for diagnosis. To provide dentists with diagnostic
Teeth recognition support, in this paper, we achieve automated teeth recognition of dental periapical X-rays by using deep learning
Convolutional neural network techniques, including teeth location and classification. Convolutional neural network(CNN) is a popular method
Label reconstruction
and has made large improvements in medical image applications. However, in our specific task, the performance
Proposal correlation module
of CNN is limited by lack of data and too many teeth positions in X-rays. Addressing this problem, we consider to
utilize the prior dental knowledge, and therefore we propose a relation-based framework to handle the teeth
location and classification task. According to the relation in teeth labels, we apply a special label reconstruction
technique to decompose the teeth classification task, and use a multi-task CNN to classify the teeth positions.
Meanwhile, for teeth location task, we design a proposal correlation module to use the information in teeth
positions, and insert it into the multi-task CNN. A teeth sequence refinement module is used for the post pro­
cessing. Our experiment results show that our relation-based framework achieves high teeth classification and
location performance, which is a big improvement compared to the direct use of famous detection structures.
With reliable teeth information, our method can provide automated diagnostic support for the dentists.

1. Introduction as a popular tool for dentists, which can provide details for specific teeth
areas, so that this type of radiograph is more representative to show
Oral diseases have been classified as the third largest disease after disease conditions. Unlike the dental bitewing X-rays which depict all
cancer and cerebrovascular diseases by the World Health Organization the teeth in a single X-rays, dental periapical X-rays only contains
(WHO), which have a strong impact on life quality (Sanders et al., 2010). several teeth, which is more difficult to recognize especially for inex­
For oral disease diagnosis, dentists first need to recognize the teeth in perienced dentists, because there is less global information to recognize
dental X-rays, including finding their location and classify the teeth the teeth positions. Some examples are shown in Fig. 1. For court doc­
positions, which means that the teeth recognition is a key step in the tors, the similar situation also happens when recognizing the teeth from
whole process. Meanwhile, unlike the face and fingerprint features the dead person, because the teeth may fall from the body, and it is very
which are difficult to obtain when one died from disaster or traffic ac­ hard to get the bitewing image without one’s cooperation. For example,
cident, the teeth can remain unbroken because they are the hardest part the jaws of a victim can be easily disintegrated in a flight accident. In
of humans, so that the court doctors can use teeth for person recognition these cases, automated teeth recognition system in dental periapical X-
(Tohnak et al., 2007; Nomir and Abdel-Mottaleb, 2005; Zhou and rays is meaningful to provide reliable assistance for medical experts.
Abdel-Mottaleb, 2005) and postmortem identification (Daniels and However, automated teeth recognition in dental periapical X-rays
Troy, 1984; Nomir and Abdel-Mottaleb, 2007; Corrêa et al., 2020). also has some challenges. On the one hand, compared to magnetic
Therefore, the teeth detection and classification is very important for resonance(MR) images and images generated by computed tomography
both dentists and court doctors. (CT), the X-rays are less clear, so that they are more difficult for
For these reasons, in this paper, we aim to achieve automated teeth recognition. On the other hand, an adult has 32 teeth in total, which
recognition in dental periapical X-rays. Dental periapical X-rays are used means that there are 32 teeth positions for recognition. Compared to the

* Correspondence to: Faculty of Electronic Engineering, Tsinghua University, Beijing, China.


E-mail address: zhangkl17@mails.tsinghua.edu.cn (K. Zhang).

https://doi.org/10.1016/j.compmedimag.2021.102022
Received 27 May 2021; Received in revised form 5 November 2021; Accepted 12 November 2021
Available online 2 December 2021
0895-6111/© 2021 Elsevier Ltd. All rights reserved.
K. Zhang et al. Computerized Medical Imaging and Graphics 95 (2022) 102022

Fig. 1. Some examples of dental periapical X-rays. Some teeth positions are difficult to recognize for inexperienced dentists. For instance, the teeth in (a) are easily to
mistakenly recognized as tooth 7 and tooth 8 while in fact they are tooth 6 and tooth 7.

large number of teeth positions, the number of labeled dental X-rays is for teeth segmentation. Shah et al. (2006) used active contour method to
usually deficient due to the difficulty of data acquisition and labeling, extract teeth contour for teeth recognition. Yuniarti (2012) used a bi­
and it is related to patients’ privacy. This is also a common case in nary support vector machine method to classify the extracted tooth into
medical image applications. The large number of teeth positions under molar or premolar for both bitewing radiographs and panoramic ra­
lack of data make it difficult to achieve good performance. diographs. Aeini and Mahmoudi (2010) considered both the location
There are some previous attempts related to automated teeth and size of the mesiodistal neck for each tooth position and applied a
recognition of dental X-rays. Some methods are based on traditional linear model for teeth classification. Tangel et al. (2013) analyzed each
feature extraction methods such as contour detection method (Arbelaez tooth based on multiple criteria such as area/perimeter ratio and
et al., 2011), level set method (Li et al., 2006) and graph-based method width/height ratio for teeth classification on periapical X-rays. Rad et al.
(Carreira and Sminchisescu, 2011; Boykov, 2001; Felzenszwalb and (2013) used clustering method and texture statistics techniques for teeth
Huttenlocher, 2004). Kumar and Rajiv (2016) calculated the similarity segmentation. These methods use hand-designed features which re­
of different images which include teeth for person identification. Lin quires less labeling efforts, but the traditional feature extractors can be
et al. (2010) used the region and contour information as the feature easily influenced by the variance in X-rays.
vector, then applied the support vector machine for teeth classification In last few years, deep learning methods have developed rapidly. The
in dental bitewing X-rays. Mahoor and Abdel-Mottaleb (2005) used convolutional neural networks(CNNs) is a popular method in deep
Fourier descriptors of teeth contours as features, and applied the learning methods. The CNN methods such as VGG16 (Simonyan and
Bayesian method to design a teeth numbering system in dental bitewing Zisserman, 2014), Resnet (He et al., 2016) and Densenet (Gao et al.,
X-rays. Said et al. (2006) applied a mathematical morphology method 2017), have achieved great success in natural image applications. They

2
K. Zhang et al. Computerized Medical Imaging and Graphics 95 (2022) 102022

Fig. 2. Our proposed relation-based framework, including the multi-task CNN and a teeth sequence refinement module.

also have made big improvements in medical image applications such as The main contributions of this paper are:(1)We design a relation-
cancer cell detection (Zhang et al., 2017), lung disease classification (Li based framework for teeth recognition in dental periapical X-rays,
et al., 2014) and brain segmentation (Dey and Hong, 2018), which are which improves the performance a lot compared to the direct use of
more robust than traditional feature extractors. Therefore we choose famous detection structures. (2)We propose a special label reconstruc­
CNN as the basic model in our task. There are also some previous works tion technique to decompose the teeth classification task according to
using CNN-based methods for teeth recognition. Zhang et al. (2018) the multiple information in teeth labels. (3)We design a proposal cor­
used a special label tree for teeth detection in dental periapical X-rays, relation module, which can utilize the information between teeth posi­
then numbered the teeth according to some rules. Chen et al. (2019) tions to get better performance.
designed a CNN for teeth classification, and used some teeth templates The remainder of this paper is organized as follows: Section 2 will
for correction. Mahdi et al. (2020) and Motoki et al. (2020) used the introduce our proposed relation-based framework with different key
faster R-CNN (Ren et al., 2015) for teeth detection, and then designed components. Section 3 will provide the details of our designed compo­
some rules according to prior knowledge for the refinement of teeth nents, including the label reconstruction technique with multiple
sequence. Tuzoff et al. (2019) first detected teeth in dental panoramic branches design, the proposal correlation module and the teeth
radiographs, and then used VGG16 (Simonyan and Zisserman, 2014) for sequence refinement module for post processing. The experiment results
teeth classification. Lin et al. (2014) segmented the tooth area based on and relevant discussion are shown in Section 4. Section 5 is the final
local singularity analysis. Rana et al. (2017) used CNN to detect the conclusion and Section 6 is the acknowledgements.
gingivitis in color images with teeth location, and Jae-Hong et al. (2018)
used CNN to directly judge whether the teeth are healthy or not without 2. Materials and methods
specific teeth positions. These methods achieve good performance in
their tasks. However, most of the methods are not designed for dental 2.1. Dataset description
periapical X-rays, and they do not consider the prior dental knowledge.
Different from natural image applications, there is much prior The dataset of dental periapical X-rays is provided by Peking Uni­
knowledge in medical image applications, which is very important for versity School and Hospital of Stomatology, which has 1250 images in
improving the performance. In some cases, the performance improve­ total, including 4336 teeth in 32 teeth positions. Each X-ray has a size of
ment by using prior knowledge can be even larger than using different approximately (300to500) × (300to400) pixels, and it has less than 5
CNN models (Liao et al., 2018; Nosrati and Hamarneh, 2016). As teeth in most cases. All the images are collected anonymously. 32 teeth
mentioned above, we use CNN-based methods in our task. Under the positions are labeled with bounding boxes by professional dentists, using
situation that the data is deficient compared to the large number of teeth the Federation Dentaire Internationale(FDI) teeth numbering system
positions, we utilize the prior dental knowledge to prevent overfitting of (ISO-3950). Therefore the labeling results are thought as the gold
CNN and get better performance. In dental X-rays, the prior knowledge standard in our experiments. In our experiments, 10-folds validation is
is that the 32 teeth positions are dependent. This relationship reflects in used for performance evaluation, which means that in each round, 1125
two aspects. First, each tooth label has multiple information. Second, images are used for network training and 125 images are used for test.
each tooth position is bounded by its neighbors. We design our The average performance is considered to ensure that the results are
relation-based framework according to these prior knowledge. Based on robust.
the multiple information in teeth labels, we propose a special label
reconstruction technique to decompose the whole teeth classification
task to three subtasks, and design a multi-task CNN accordingly, which 2.2. System framework
can be seen as a divide-and-conquer method. Considering the informa­
tion between teeth positions, we also design a proposal correlation In this section, we propose our system framework. As mentioned
module and insert it into multi-task CNN to get better performance. above, we design the relation-based framework to achieve automated
Finally some rules generated by knowledge are also used for the teeth recognition, which is shown in Fig. 2. Our system is a multi-mask
refinement of teeth sequence output by our multi-task CNN. The CNN with the post processing module, which receives a dental periapical
experiment results show that our relation-based framework achieves big X-ray as input. It outputs all the teeth bounding boxes with classification
improvement of teeth recognition performance compared to the direct results. In our system framework, the basic structure is a object detection
use of famous detection structures. framework. Faster R-CNN (Ren et al., 2015) is a famous object detection
method, which achieve great success on many image recognition tasks,

3
K. Zhang et al. Computerized Medical Imaging and Graphics 95 (2022) 102022

Fig. 3. Our multiple branches according to label reconstruction technique. Compared to traditional detection branch and classification branch with many classes (top
figure), we design three classification branches which has less classes (bottom figure). The FL means the full connection layer and the number means the dimension of
output for each branch.

so that we use faster R-CNN as our basic structure. Faster R-CNN in­ not consider it on this stage. Accordingly, instead of predicting the
cludes a convolution backbone, a region proposal network(RPN) and a classes in 32 teeth positions directly for proposal candidates, we design
ROIPooling layer. The convolutional backbone usually includes several three classification branches and use three kinds of new labels as
convolutional layers, pooling layers and batch normalization layers, and groundtruths, which is shown in Fig. 3. This decomposition can be seen
it can output the processed feature map. After the convolutional back­ as a divide-and-conquer method, and our framework is a multi-task
bone, the region proposal network can generate proposal candidates CNN. Each branch includes a full connection layer, which represents a
from the feature map. The ROIPooling layer is used to extract these subtask. The first branch is used to determine whether a proposal
proposal candidates to fixed-length vector, and finally these proposal candidate is a tooth. The second branch is used to judge whether a
candidates are sent to a classification branch and a detection branch to proposal candidate is an upper tooth or a down tooth. The third branch
get the object classification and detection results. is used to classify the specific teeth positions from T1 to T8. Therefore all
Based on faster R-CNN, we apply Resnet (He et al., 2016) as the the three branches have much less classes than the direct prediction of
convolutional backbone. Considering the multiple information in teeth 32 teeth positions. The fourth branch is the detection branch which is
labels, we propose a special label reconstruction technique, and the same as that in faster R-CNN.
accordingly design a multiple branches structure to output the results, On the training stage, for our multi-task CNN, the total loss function
which can be regarded as a multi-task CNN. Based on the relationship can be defined as below:
between teeth, we also design a proposal correlation module, and insert
losstotal = lossb1 + I[c = 1](λb2 lossb2 + λb3 lossb3 + +λb4 lossb4 ) (1)
it into the multi-task CNN. A teeth sequence refinement module is used
for post processing after the multi-task CNN. More details are introduced where lossb1, lossb2 and lossb3 represent three loss functions of classifi­
in the following sections. cation branches, and the cross entropy loss is used for all of them. lossb4
is the box regression loss. We use the smooth L1 loss (Ren et al., 2015)
2.3. Label reconstruction for it, which is the same as that in faster R-CNN. λb2, λb3 and λb4 represent
the weight parameters. I is the indicator function which value is 0 or 1. It
As mentioned above, compared to the number of training data, the means that we only calculate the latter three loss items if the ground­
number of teeth positions is too large to train network directly. How­ truth label of a proposal candidate is tooth(c=1), otherwise these loss
ever, according to the prior knowledge of teeth, the 32 teeth positions items are ignored. The back propagation algorithm is used to update the
have relationship and they can be manually divided into four parts: the network parameters. On the test stage, we only consider the bounding
right upper teeth(RUT), the left upper teeth(LUT), the right down teeth box that is judged as a tooth in the first branch, and we get other marks
(RDT), and the left down teeth(LDT). Each part has 8 teeth positions, and bounding box prediction from other three branches. These infor­
named T1 to T8 from the middle to each side of jaw. We order the teeth mation are combined and sent to the teeth sequence refinement module
sequence in front view so that all the teeth positions can be labeled as for posting processing.
below:
RUT8, RUT7, ..., RUT1, LUT1, ..., LUT7, LUT8 2.4. Proposal correlation module

RDT8, RDT7, ..., RDT1, LDT1, ..., LDT7, LDT8


As mentioned before, in our multi-task CNN, after we get the final
According to the division of labels, there are three different marks: R feature map F from the convolutional backbone, the RPN with ROI­
or L, U or D, and 1–8. Our label reconstruction technique is to combine Pooling layer can extract a fixed number N of proposal candidates to
some of these marks to get three kinds of new labels: the teeth(T), the feature vectors f nt (n = 1, 2, ..N). For dental X-rays, the tooth has corre­
upper teeth(UT) or the down teeth(DT), teeth 1(T1) to teeth 8(T8). The lation to each other. Each tooth is connected with its neighbor teeth. In
mark R and T can be obtained by the output teeth sequence so that we do other words, once we find a tooth in the input X-ray, there always exists

4
K. Zhang et al. Computerized Medical Imaging and Graphics 95 (2022) 102022

Fig. 4. Our proposal correlation module and its correlation units. The FL represents the full connection layer.

another tooth on a side of it. This information is hardly seen in natural For our correlation unit design, we consider to extract the information
image applications, and it can be used as patterns to find the neighbor between proposal candidates, which has a similar form to the self-
teeth when there is little global information in dental periapical X-rays. attention mechanism (Vaswani et al., 2017). A query matrix Ql, a key
Therefore, we aim to model this correlation. We design a proposal cor­ matrix Kl and a value matrix Vl which have trainable parameters are set
relation module accordingly and insert it into our multi-task CNN, which for each correlation unit l. Besides the information between proposal
is shown in Fig. 4(a). candidates, the environment should also have effects on each proposal
Since we can not know the label of proposal candidates in advance candidates. Therefore, we use a global pooling operation for feature map
especially on the test stage, we consider to utilize the information be­ F to get the global feature vector gf, and combine it with the feature
tween each pair of proposal candidates. In our proposal correlation vectors of proposal candidates:
module, all the feature vectors f nt (n = 1, 2, ..N) are first sent into a full
connection layer to generate the output feature vector f na (n = 1, 2, ..N) f ns = f na + gf (n = 1, 2, ..N) (3)
for each proposal candidate. Then these feature vectors are sent to M
correlation units simultaneously. Each correlation unit is aimed to learn After we get f ns (n = 1, 2, ..N), for proposal candidate n, the output of
a correlation pattern f pn n correlation unit l(l = 1, 2, . . , M) can be described as:
c (p = 1, 2, .., M) by using all of f a (n = 1, 2, ..N),

which can represent a kind of position relationship. We choose the f ln wlmn ⋅(Vl ⋅f ms ) (n = 1, 2, ..N) (4)
c =
output pattern that has the maximum response for all the f na from cor­ m∈[1,N]
relation units, so that the final output of our proposal correlation module
can be described as: where wlmn is the attention weight and its summation is 1 over index m,
which means that the influence of other proposal candidates is weighted
f no = f na + max f ln
c (n = 1, 2, ..N) (2) by the attention weights. The wlmn should be related to the both the
l∈[1,M]
feature and position of proposal candidates, so that we consider the

5
K. Zhang et al. Computerized Medical Imaging and Graphics 95 (2022) 102022

Fig. 5. A visualized example of our system procedure.

information of both feature vectors and relative positions of proposals to image are all the upper teeth or the down teeth, and this prior knowl­
generate the attention weights. For the interaction of feature vector f ms edge can be used for output refinement. Therefore, for the teeth output
and f ns , we use by the multi-task CNN, we define a judgment function to evaluate the
results as below:
wlmn
f = DP(Kl ⋅f ms , Ql ⋅f ns ) (5)

Tu Td

judgement = Tu + Cui − (Td + Cdi ) (9)
to get the scalar weight wlmn
f where DP means the dot product. For the i=1 i=1
relevant position pm and pn of proposal candidate m and n predicted by
RPN, described as [xm, ym, hm, wm] and [xn, yn, hn, wn]. Their relative where Tu and Td are the number of upper teeth and down teeth judged by
position is more important than the absolute position, so that we the second branch in multi-task CNN. Cui and Cdi are the relevant con­
combine their positions to a relative position vector fidence coefficient. If judgment is positive, it means that most of the teeth
in a X-ray are the upper teeth, and we judge all the teeth to upper teeth,
pmn
|xm − xn | |ym − yn | wm hm
(6) otherwise they are judged to down teeth. In most cases, if the number of
r = [ , , , ].
wm hm wn hn right teeth is larger than the number of wrong teeth, all the teeth will be
judged to the right teeth for mark U or D.
For better representation, the embedding method in Vaswani et al.
Since the teeth sequence has an order and the adjacent teeth should
(2017) and Hu et al. (2018) can transform the position vector pmn r to a have the adjacent numbers, we check the teeth positions of output teeth
higher dimension vector pmn for the following-up processing. Then we
h sequence from T1 to T8, and give teeth the mark R or L. According to the
use a trainable vector P to transform pmn
h to a scalar weight, which can be number of teeth in an output teeth sequence, we use all the right teeth
calculated as sequence templates and compare them with the teeth sequence pre­
wlmn = Pl ⋅pmn (7) dicted by multi-task CNN. The difference between a right template and
the output teeth sequence is defined as
p h

After we get the feature information wlmn


f and position information wlmn
p , ∑
N

we can calculate the attention weight wlmn by using a softmax-like difference = wi ⋅I[Ci ∕
= Ti ] (10)
i=1
function, which is defined as

wlmn lmn where the Ci and Ti are the label of tooth candidate output by multi-task
h ⋅exp(ws )
wlmn = (8) CNN and the right tooth position in template respectively. I is the

N
wlin lin
h ⋅exp(ws ) indicate function. If the condition is true, then the value of I is 1,
otherwise it is 0. wi is the weight of Ti. The larger weight means that the
i=1

relevant tooth position in template is more credible. After comparison,


After we get all the attention weights, we can get f ln
c from each corre­
the output teeth sequence is judged to the template that has the mini­
lation unit l and generate the output f no for the proposal correlation
mum difference. It is worth mentioning that there are some special cases
module. These feature vectors are sent to the multiple branches for final
for consideration. If the X-ray only contains one tooth, the teeth
recognition. The trainable parameters are randomly initialized in cor­
sequence refinement module will not work, because a single tooth
relation units so that they are different from each other. During the
cannot be corrected without other teeth. If the X-ray contains several
network training stage, the parameters in our proposal correlation
teeth, but a tooth is lost in the middle of teeth sequence. Then the
module can be directly updated through the back propagation
bounding boxes output by multi-task CNN is not adjacent in the relevant
algorithm.
location. We add a special placeholder, marked as X, which represents
the missing tooth in the teeth sequence. When we use the teeth template
to calculate the difference according to formula 10, this placeholder is
2.5. Teeth sequence refinement module
always regarded as a right one. We use an example to describe the whole
process, if all the wi is set to 1 and there are five teeth detected in an
After we get the teeth information from multiple branches, we can
input X-ray, including a missing tooth. If the ordered teeth sequence is
combine these information to get the teeth sequence, arranged by the x-
judged to T3, T2, T1, X, T1 by multi-task CNN, it is easy to know that the
coordinate of bounding box center. We design a teeth sequence refine­
matching template is T3, T2, T1, T1, T2 by calculation of difference, and a
ment module for post processing. In periapical X-rays, the teeth in an

6
K. Zhang et al. Computerized Medical Imaging and Graphics 95 (2022) 102022

Fig. 6. Some visualized results of our system output.

wrong tooth position T1 is corrected to T2. According to the order of position is larger and this tooth position is more credible in templates.
teeth, the sequence can be further marked as RT3, RT2, RT1, X, LT2, so We also use the pre-train model in ImageNet for our multi-task CNN to
that the R or L information is inserted. With U or D judged by formula 9, get a good parameters initialization.
we can get the precise label from 32 teeth positions for each tooth, and An example of our system procedure is shown in Fig. 5. The infor­
finish the whole teeth recognition task. mation of tooth position is first obtained by each branch. We can
combine these information to get the teeth sequence, and send it to the
3. Experiments and discussion teeth sequence refinement module to get the final output results. Our
system can not only provide dentists with comprehensive and reliable
3.1. Parameters setting and system output diagnostic support, but also provide visualized intermediate results for
analysis in case that there are wrong final results.
For experiment evaluation, the experiment environment is running
on ubuntu 14.04 with gpu 1080Ti, and the software used in the exper­ 3.2. Evaluation and comparisons
iments is python with mixnet. For network training, the stochastic
gradient descent(SGD) optimizer with momentum is used. The initial We do a lot of experiments for our proposed framework. We first
learning rate and momentum are set to 0.001 and 0.9 respectively. The show some visualized results output by our framework, which is shown
batch size is set to 32 and the number of training epoches is set to 200. in Fig. 6. We can find that our system can get right results for different
The λb1, λb2 and λb3 in formula 1 are simply set to 1. For our proposal teeth positions. For quantitative results, the widely used metrics are the
correlation module, the M and N in formula 2 are set to 16 and 1000 precision, recall and F-score, which are defined as below:
respectively. For our teeth sequence refinement module, the wi in for­
TP
mula 10 is set to the percentage of number of each tooth position in precision = (11)
TP + FP
training data, the larger wi means that the number of relevant tooth

7
K. Zhang et al. Computerized Medical Imaging and Graphics 95 (2022) 102022

Table 1 Table 3
The performance of intermediate nodes in our relation-based framework. The comparisons of different methods. The TSRM means the teeth sequence
Branch 1 Branch 2 Branch 3 Teeth sequence refinement
refinement module.
module Precision Recall F-score
Precision 0.968 0.956 0.790 0.951 Fast R-CNN 0.629 0.595 0.612
Recall 0.971 0.958 0.798 0.955 Faster R-CNN 0.672 0.595 0.631
F-score 0.969 0.957 0.794 0.953 R-FCN 0.718 0.487 0.580
Fast R-CNN + TSRM 0.861 0.762 0.808
Faster R-CNN + TSRM 0.890 0.785 0.834
R-FCN + TSRM 0.913 0.609 0.731
Table 2 Zhang et al. (2018) 0.893 0.897 0.895
The comparison for our proposal correlation module and its variants. GF means Chen et al. (2019) 0.904 0.909 0.906
the global feature generated by global pooling operation and it is listed sepa­ Our relation-based framework 0.951 0.955 0.953
rately. PC is our proposed proposal correlation module. The PCta and PCtm are the
proposal correlation module with average response and t correlation units,
proposal correlation module with maximum response and t correlation units. than the number of right teeth, all the teeth in the sequence may be
judged to wrong results. However, this case seldom happens in our ex­
Precision Recall F-score
periments, because each classification branch has a good performance,
Without PC 0.911 0.914 0.912
so that the probability of the case that more than one tooth are predicted
PC1 0.932 0.935 0.933
0.940 0.942 0.941
wrongly in a X-ray is very low. In fact, we only find that the X-ray which
PC16
a
just includes two teeth can lead to this case. If one of the two teeth is
PC16 0.946 0.949 0.947
m
predicted wrongly, then the other teeth may be judged to a wrong one.
PC16
a +GF
0.943 0.944 0.943
This case does not make large influence to our final results.
PC4m +GF 0.939 0.942 0.940
We also evaluate the performance of our proposal correlation mod­
PC8m +GF 0.946 0.950 0.948
ule. The proposal correlation module in multi-task CNN utilizes the in­
PC16 0.951 0.955 0.953
m +GF formation between proposal candidates to get better teeth classification
and location performance. We remove this module to compare the
performance. We also evaluate the performance of some variations, such
TP
recall = (12) as using average response, changing the number of correlation units and
TP + FN
removing the global feature. The experiment results are shown in
2⋅precision⋅recall Table 2.
F-score = (13) From the results, by inserting the proposal correlation module, the
precision + recall
teeth recognition performance is much better than that without proposal
where the TP,TN,FP,FN denote the number of true positive, true nega­ correlation module, because we utilize the location information between
tive, false positive, false negative objects respectively. We use the teeth positions. Each proposal candidate can influence its neighbors. By
intersection over union(IOU) between bounding boxes to count the true ablation study, when the number of correlation units M increases, we
positive predictions. The IOU of two boxes X and Y is defined as can get further performance improvement. We believe this is because
each correlation unit can output a pattern that represent a kind of
area(X ∩ Y)
IOU = (14) relationship between teeth positions. By comparison, the maximum
area(X ∪ Y) response from correlation units is better than the average response. By
the combination of proposal feature and global feature, we can get better
The box predictions output by our relation-based framework are
results, which means that the global information is helpful for local
compared to the groundtruth. If the IOU of a box prediction and the
proposal candidates. Each proposal candidate can interact with the
relevant groundtruth is larger than a threshold, then it is judged as a
global feature vector to get more global information.
right result. We set the IOU threshold to 0.5 in our experiments. We first
For further evaluation, we compare our relation-based framework
evaluate the performance of our multiple branches and teeth sequence
with some famous detection baselines, such as Fast R-CNN (Girshick,
refinement module. Based on the teeth classification and location re­
2015), Faster R-CNN (Ren et al., 2015) and R-FCN (Dai et al., 2016).
sults, we can calculate the precision, recall, and F-score of three
These structures achieve very good performance in many natural image
branches and the teeth sequence refinement module. In Table 1, we test
applications, so that we use these structures to detect 32 teeth positions
the performance of these intermediate nodes in our relation-based
as the direct detection method. For each network structure, since the
framework.
teeth sequence refinement module can check the output teeth sequence
According to the results, if we only judge whether a proposal
and correct a large number of wrong classification results, we also insert
candidate is a tooth or background(branch 1), or whether it is a upper
this module to these baselines for comparison. Meanwhile, some pre­
tooth or a down tooth(branch 2), the training data is enough to get high
vious attempts for teeth location and numbering in dental periapical
precision and recall because there are only two classes for these two
X-rays such as Zhang et al. (2018) and Chen et al. (2019) are also
branches. For the third branch, the precision and recall decrease because
compared with our relation-based framework. All of these methods are
the number of classes increases sharply to 8(T1 to T8), which is large
evaluated on our dataset for comparison. All the results are shown in
compared to the number of training data, and this leads to performance
Table 3.
drop. It also indicates that the direct classification on 32 teeth positions
According to the results, our relation-based framework achieves
can not achieve good performance due to insufficient network training,
higher teeth classification and location performance than these previous
which will be shown in the following experiments, and it is necessary to
works. Compared to the famous detection structures, although these
decompose the teeth classification task. Finally, the teeth sequence
structures and our relation-based framework use the same data for
refinement module corrects a large number of wrong classification re­
network training, our proposed method is much better than these direct
sults according to formula 9 and formula 10, and the final performance
methods. The improvement mainly benefits from our multiple branches
increases to a high level again. In fact, in most cases, if the number of
design. Even with the correction of teeth sequence refinement module,
right teeth is larger than the number of wrong teeth, the teeth sequence
these direct methods can not achieve the performance of our divide-and-
refinement module can correct these wrong classification results and
conquer method. This is because these famous detection structures can
improve the performance a lot. If the number of wrong teeth is no less

8
K. Zhang et al. Computerized Medical Imaging and Graphics 95 (2022) 102022

Table 4 Acknowledgements
The performance of different teeth positions.
Precision Recall F-score This work is sponsored by the National Natural Science Foundation
of China (Grant No. 61571266), Beijing Municipal Natural Science
T1 0.921 0.916 0.918
T2 0.950 0.981 0.965 Foundation (No. L192026) and Tsinghua-Foshan Innovation Special
T3 0.978 0.923 0.950 Fund (TFISF) (No. 2020THFS0111).
T4 0.935 0.969 0.952
T5 0.948 0.971 0.959 References
T6 0.924 0.931 0.927
T7 0.963 0.928 0.945
Aeini, F., Mahmoudi, F., 2010. Classification and numbering of posterior teeth in
T8 0.973 0.948 0.960
bitewing dental images. In: 2010 3rd International Conference on Advanced
Computer Theory and Engineering (ICACTE), vol. 6, pages V6–66. IEEE.
Arbelaez, P., Maire, M., Fowlkes, C., Malik, J., 2011. Contour detection and hierarchical
not extract the relationship of teeth and other information efficiently, image segmentation. Pattern Anal. Mach. Intell. IEEE Trans. 33 (5), 898–916.
under lack of training data and large number of teeth positions, and this Boykov, Y., 2001. Interactive graph cuts for optimal boundary and region segmentation
of objects in n-d images. In: Proceedings of the IEEE Conference on Computer Vision.
leads to insufficient training of network. By using the label reconstruc­ Carreira, J., Sminchisescu, C., 2011. Constrained parametric min-cuts for automatic
tion technique, we insert the prior knowledge to our framework. We object segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 34 (7), 3241–3248.
decompose the teeth classification task manually and insert the proposal Chen, Hu, Zhang, Kailai, Lyu, Peijun, Li, Hong, Zhang, Ludan, Wu, Ji, Lee, Chin-Hui,
2019. A deep learning approach to automatic teeth detection and numbering based
correlation module, and our relation-based framework can extract the on object detection in dental periapical films. Sci. Rep. 9 (1), 3840.
tooth position information by multiple branches. Each branch tries to Corrêa, HSD, Brescia, G., Cortellini, V., Verzeletti, A., 2020. Human identification
classify a part of the whole tooth label, which has less classes under lack through dna analysis of restored postmortem teeth. Forensic Sci. Int. Genet. 47,
102302.
of data and is easier to train. Meanwhile, more prior knowledge can be
Dai, J., Li, Y., He, K., Sun, J., 2016. R-fcn: object detection via region-based fully
easily inserted to the teeth sequence refinement module, which also convolutional networks. In: Advances in Neural Information Processing Systems,
helps the system to improve teeth recognition performance a lot. 379–387.
Therefore, our method achieves very good results for teeth recognition. Daniels, Troy, E., 1984. Human mucosal langerhans cells: postmortem identification of
regional variations in oral mucosa. J. Investig. Dermatol. 82 (1), 21–24.
In dental practice, different teeth positions from T1 to T8 represent Dey, Raunak, Hong, Yi, 2018. Compnet: Complementary segmentation network for brain
different types of teeth, including a wisdom, two molars, two premolars, mri extraction. International Conference on Medical Image Computing and
a canine, and two incisors. Their structures are different and should be Computer-Assisted Intervention. Springer, pp. 628–636.
Felzenszwalb, P.F., Huttenlocher, D.P., 2004. Efficient graph-based image segmentation.
treated accordingly, so that we also evaluate the performance of teeth Int. J. Comput. Vis. 59 (2), 167–181.
recognition in different teeth positions for our proposed method. The Gao, H., Zhuang, L., Van Der Maaten, L., Weinberger, K.Q., 2017. Densely connected
results are shown in Table 4. From the results, our relation-based convolutional networks. In: IEEE Conference on Computer Vision and Pattern
Recognition.
framework achieves consistent performance on different teeth posi­ Girshick, R., 2015. Fast r-cnn. In: Proceedings of the IEEE International Conference on
tions. It means that our relation-based framework can indeed provide Computer Vision, 1440–1448.
efficient and reliable teeth recognition results for dentists, which is very He, K., Zhang, X., Ren, S., Jian, S., 2016. Deep residual learning for image recognition.
In: IEEE Conference on Computer Vision and Pattern Recognition.
helpful for diagnosis support. Hu, H., Gu, J., Zhang, Z., Dai, J., Wei, Y., 2018. Relation networks for object detection.
In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
Jae-Hong, Lee, Do-hyung, Kim, Seong-Nyum, Jeong, Seong-Ho, Choi, 2018. Diagnosis
4. Conclusion
and prediction of periodontally compromised teeth using a deep learning-based
convolutional neural network algorithm. J. Periodontal Implant Sci. 48 (2),
In this paper, we propose a novel relation-based framework for teeth 114–123.
recognition of dental periapical X-rays, including teeth location and Kumar, Rajiv, 2016. Teeth recognition for person identification. In: International
Conference on Computation System and Information Technology for Sustainable
classification. The lack of data is the common case in medical field. To Solutions, 13–16.
handle with this problem efficiently, we first analysis the prior knowl­ Li, S., Fevens, T., KrzyAk, A., Song, L., 2006. An automatic variational level set
edge and design the whole framework. We apply the label reconstruc­ segmentation framework for computer aided dental x-rays analysis in clinical
environments. Comput. Med. Imaging Graph. 30 (2), 65–74.
tion technique to decompose teeth classification task and design a multi- Li, Q., Cai, W., Wang, X., Zhou, Y., Feng, D.D., Chen, M., 2014. Medical image
task CNN, while we also design a proposal correlation module by classification with convolutional neural network. In: International Conference on
considering the location information in teeth positions, and the teeth Control Automation Robotics and Vision.
Liao, Haofu, Tang, Yucheng, Funka-Lea, Gareth, Luo, Jiebo, Kevin Zhou, Shaohua, 2018.
sequence refinement module is used for post processing. Our relation- More knowledge is better: cross-modality volume completion and 3d. 2d
based framework has much better teeth recognition performance than segmentation for intracardiac echocardiography contouring. International
previous works and famous direct detection methods, which can provide Conference on Medical Image Computing and Computer-Assisted Intervention.
Springer, pp. 535–543.
reliable and comprehensive diagnostic support for dentists. In the future
Lin, P.L., Huang, P.Y., Huang, P.W., Hsu, H.C., Chen, C.C., 2014. Teeth segmentation of
work, we will continue to improve the teeth recognition performance dental periapical radiographs based on local singularity analysis. Comput. Methods
and expand our method to similar tasks. Prog. Biomed. 113 (2), 433–445.
Lin, PL, Lai, YH, Huang, PW, 2010. An effective classification and numbering system for
dental bitewing radiographs using teeth region and contour information. Pattern
CRediT authorship contribution statement Recognit. 43 (4), 1380–1392.
Mahdi, F.P., Motoki, K., Kobashi, S., 2020. Optimization technique combined with deep
learning method for teeth recognition in dental panoramic radiographs. Sci. Rep. 10
Kailai Zhang: Conceptualization, Methodology, Software, Formal (1).
analysis, Investigation, Writing, Visualization. Hu Chen: Conceptuali­ Mahoor, Mohammad H., Abdel-Mottaleb, Mohamed, 2005. Classification and numbering
zation, Validation, Investigation, Data curation. Peijun Lyu: Supervi­ of teeth in dental bitewing images. Pattern Recognit. 38 (4), 577–586.
Motoki, K., Mahdi, F.P., Yagi, N., Nii, M., Kobashi, S., 2020. Automatic teeth recognition
sion, Resources, Project administration. Ji Wu: Supervision, Resources, method from dental panoramic images using faster r-cnn and prior knowledge
Project administration. model. In: 2020 Joint 11th International Conference on Soft Computing and
Intelligent Systems and 21st International Symposium on Advanced Intelligent
Systems (SCIS-ISIS).
Nomir, O., Abdel-Mottaleb, M., 2005. A system for human identification from x-ray
Declaration of Competing Interest dental radiographs. Pattern Recognit. 38 (8), 1295–1305.
Nomir, O., Abdel-Mottaleb, M., 2007. Human identification from dental x-ray images
The authors declare that they have no known competing financial based on the shape and appearance of the teeth. IEEE Trans. Inf. Forensics Secur. 2
(2), 188–197.
interests or personal relationships that could have appeared to influence
the work reported in this paper.

9
K. Zhang et al. Computerized Medical Imaging and Graphics 95 (2022) 102022

Nosrati, M.S., Hamarneh, G., 2016. Incorporating prior knowledge in medical image Tangel, M.L., Fatichah, C., Fei, Y., Janet, P.B., Hirota, K., 2013. Dental classification for
segmentation: a survey. arXiv:1607.01092. periapical radiograph based on multiple fuzzy attribute. In: 2013 IFSA World
Rad, A.E., Shafry, M., Rahim, M., Norouzi, A., 2013. Digital dental x-ray image Congress NAFIPS Annual Meeting.
segmentation and feature extraction. Telkomnika Indonesian. J. Electr. Eng. 11 (6), Tohnak, S., Mehnert, Ajh, Mahoney, M., Crozier, S., 2007. Synthesizing dental
3109–3114. radiographs for human identification. J. Dent. Res. 86 (11), 1057–1062.
Rana, Aman, Yauney, Gregory, Wong, Lawrence C., Gupta, Otkrist, Muftu, Ali, Tuzoff, D.V., Tuzova, L.N., Bornstein, M.M., Krasnov, A.S., Kharchenko, M.A.,
Shah, Pratik, 2017. Automated segmentation of gingival diseases from oral images. Nikolenko, S.I., Sveshnikov, M.M., Bednenko, G.B., 2019. Tooth detection and
2017 IEEE Healthcare Innovations and Point of Care Technologies (HI-POCT). IEEE, numbering in panoramic radiographs using convolutional neural networks.
pp. 144–147. Dentomaxillofac. Radiol.
Ren, S., He, K., Girshick, R., Sun, J., 2015. Faster r-cnn: towards real-time object Vaswani, Ashish, Shazeer, Noam, Parmar, Niki, Uszkoreit, Jakob, Jones, Llion,
detection with region proposal networks. In: Advances in Neural Information Gomez, AidanN, Kaiser, Lukasz, Polosukhin, Illia, 2017. Attention is all you need.
Processing Systems, 91–99. arXiv.
Said, E.H., Nassar, Dem, Fahmy, G., Ammar, H.H., 2006. Teeth segmentation in digitized Yuniarti, A., 2012. Classification and numbering of dental radiographs for an automated
dental x-ray films using mathematical morphology. IEEE Trans. Inf. Forensics Secur. human identification system. Telkomnika 10 (1).
1 (2), 178–189. Zhang, Kailai, Wu, Ji, Chen, Hu, Lyu, Peijun, 2018. An effective teeth recognition method
Sanders, Anne E., Slade, Gary D., Lim, Sungwoo, Reisine, Susan T., 2010. Impact of oral using label tree with cascade network structure. Comput. Med. Imaging Graph. 68,
disease on quality of life in the us and australian populations. Community Dent. Oral 61–70.
Epidemiol. 37 (2), 171–181. Zhang, J., Hu, H., Chen, S., Huang, Y., Qiu, G., 2017. Cancer cells detection in phase-
Shah, S., Abaza, A., Ross, A., Ammar, H., 2006. Automatic tooth segmentation using contrast microscopy images based on faster r-cnn. In: International Symposium on
active contour without edges. In Biometric Consortium Conference. Computational Intelligence and Design.
Simonyan, K., Zisserman, A., 2014. Very deep convolutional networks for large-scale Zhou, J., Abdel-Mottaleb, M., 2005. A content-based system for human identification
image recognition. arXiv:1409.1556. based on bitewing dental x-ray images. Pattern Recognit. 38 (11), 2132–2142.

10

You might also like