Professional Documents
Culture Documents
Wei Chai
Department of Orthopaedics
Chinese PLA General Hospital
Beijing, China
email: royal1860@sina.cn
Abstract—Aiming at the problem of unstable classification of (ARCO) in 1992, which includes the performance of
osteonecrosis of the femoral head, a classification method based Magnetic Resonance Imaging (MRI) and the assessment of
on deep learning is proposed. The 3D U-net is used to segment femoral head involvement degree [6]. There are also other
the CT data to obtain the complete femoral region. The several ONFH classification methods, such as Steinberg
Convolutional Autoencoder is used to reduce the dimension and classification [4], improved Kerboul classification [7],
feature extraction, and the deep information of the segmented Japanese Investigation Committee classification (JIC) [8] and
femur is obtained. Finally, K-means is used to cluster the China-Japan Friendship Hospital (CJFH) three pillars
extracted features and obtain the result. Experiments on 120 structure classification [9].
cases of patients shows that all of the 4 indexes, including Dice
coefficient, which are used to evaluate the segmentation effect, ONFH is divided into four stages according to its scope
are higher than 94%. The peak signal-to-noise ratio is adopted and the collapse degree of femoral head by using the staging
to evaluate the input and output image of Convolutional of ARCO [6]. ONFH is categorized into three groups by using
Autoencoder, and the result is higher than 34dB. Then, the the Steinberg classification based on the proportion of femoral
average inter-class similarity and intra-class similarity of the head necrosis in total femoral head volume [4], while it is
classification method are 51% and 87%, respectively, indicating divided into three categories by the improved Kerboul
that there are great differences between the types of femurs, and classification with the angle of the coronal and sagittal planes
there is a high similarity within the class. The proposed method
of femoral head [7]. In the JIC classification, femoral head is
has good clinical application value for individual treatment of
patients and can assist doctors to carry out more targeted
separated into three parts on the coronal plane, and ONFH is
treatment for patients with early osteonecrosis of the femoral grouped into four types according to the extent of necrosis
head. areas [8]. In addition, the CJFH classification divides the
femoral head into three parts on the coronal plane, and ONFH
Keywords—Osteonecrosis of Femoral Head, 3D U-net, has five categories according to the extent of necrosis areas
Convolutional Autoencoder, K-means Clustering [9]. Li et al. have observed the natural course of ONFH for
many years, concluding the high accuracy of CJFH
I. INTRODUCTION classification in predicting mild or severe ONFH and the low
Osteonecrosis of Femoral Head (ONFH) is a common accuracy on the middle extent of ONFH. As demonstrated in
complication in orthopaedics, which is probably caused by the Ref. [10], the consistency of JIC classification among
death of bone cells, bone marrow components and the different observers shows good performance, while those of
subsequent cell repairing reaction with the damage or Steinberg classification and improved Kerboul classification
interruption of femoral head blood supply. What’s more are relatively inferior. Even if the JIC classification
severe, it could lead to joint pain and joint dysfunction of demonstrates accurate results in predicting the ONFH, its
patients [1]. In addition, lots of other causes could result from accuracy on the moderate lesions is still low.
the ONFH, such as hip trauma, application of corticosteroids, The existing classification of osteonecrosis of femoral
alcoholism, decompression sickness, sickle cell anemia and so head has a certain degree of limitations, mainly depends on
on [1]. If a patient exists the symptom of articular surface the personal experience of doctors. It is not stable enough, and
collapse, physician usually has to replace the joint, which is the results cannot be detailed to understand the disease and
not only costly but also risky. Hence, it is particularly pathological changes. It also cannot predict the prognosis of
significant to research the classification of the early ONFHs patients accurately enough. However, there are few reports on
and develop the individualized treatment plans for patients. the combination of diagnosis of osteonecrosis of femoral head
In the previous works of ONFH classification, Marcus [2], and big data artificial intelligence.
Ficat and Arlet [3], Steinberg [4] and the University of Aiming at the instability of traditional classification
Pennsylvania [5] were all based on X-ray images and clinical methods, a new method based on deep learning is proposed in
symptoms with some limitations. Currently, a relatively ideal this paper:
staging method is the international standard of ONFH
provided by the Association Research Circulation Osseous
each of which is ReLU after convolution. At the last layer, the 4×4×4×512 4×4×4×512
1×1×1 convolution reduces the number of output channels to
2×2×2×1024 2×2×2×1024
the number of tags. In this paper, we set the number of tags to
2 in order to distinguish the femoral region from the 1 × 1 × 1 × 2048
background area.
Kernel=3×3×3
2 32 64 Strides=2
64+128 64 64 2
Conv
Kernel=3×3×3
Strides=2
Conv
64 128
128+256 128 Kernel=5×5×5
Strides=2
Conv
128 256 256+512 256 concat Fig. 2. Convolutional Autoencoder network structure
conv ( +BN ) + Relu
max pooling
256 up-conv
C. K-means Clustering
conv The clustering analysis divides objects into realistic
groups, which is helpful for further understanding and
Fig. 1. 3D U-net network architecture
description of data. The definition of cluster was first
proposed by Everitt in 1974, which requires that the objects in 2 X ÇY
the group should be relatively compact in space, and the Dice =
X +Y ()
spacing between any two objects in the group should be less
than that between any two objects each in a different group. TP
Recall =
K-means is an unsupervised algorithm based on the prototype. TP + FN
It evaluates the similarity using the distance between objects. ()
There is a greater similarity between closer objects, otherwise TP
a smaller similarity [13]. Precision =
TP + FP
The advantages of K-means clustering are mainly efficient, ()
concise, scalable and effective in the large scale of dataset. In F - measure = 2 Precision
Recall
our work, K-means clustering is applied on the deep feature Precision + Recall
extracted by Convolutional Autoencoder, so we obtain an ()
unsupervised classification result for the early ONFH.Our where TP is the number of femoral area pixels correctly
experimental results below have shown that there is a great segmented, TN is the number of background pixels correctly
intra-class similarity and a significant inter-class difference. segmented. FP is the number of background pixels incorrectly
segmented into femoral areas, and FN is the number of
III. EXPERIMENT femoral pixels incorrectly labeled as background. X and Y
The femur CT data are provided by the Chinese PLA represent the segmented results by physicians and networks,
General Hospital contain 120 cases, with 128 CT slices respectively. We evaluate the average values for different
include for each case. The data are processed by batch parts of CT slices in Table 1.
desensitization, removal of the basic information of patients
such as name and identification number. The patients are TABLE I. 3D U-NET SEGMENTATION RESULT
45~65 years old. The 10 percent of the total data is randomly Region Dice Precision Recall F-measure
selected as the verification set to explore the best parameters Upper 0.9521 0.9645 0.9624 0.9518
for the model, then select 10 percent of the total data as the Middle 0.9544 0.9436 0.9411 0.9602
test set, while the remaining is used as the training dataset. Lower 0.9602 0.9589 0.9612 0.9601
A. Femoral segmentation
In our experiment, 3D U-net is used to segment the 3-D As shown in Table 1, the segmentation results of 3D U-net
CT slices of the femur. We adopt the Adam algorithm for the for the femoral region in CT slices are higher than 0.94 on
gradient descent (β1 = 0.5, β2 = 0.999), with the learning rate Dice, Recall, Precision and F-measure. The experimental
of 0.0002, mini-batch = 1, and the iterations of 300. We results show that the 3D U-net used in this paper can
implement our models with TensorFlow, and we train the effectively segment the femoral region of different slices.
models on a NVIDIA-DGX-1V machine equipped with 8 B. Feature extraction
Tesla V100 GPUs, each graphic card with 512 GB memory.
Convolutional Autoencoder is used to reduce dimensions
Figure 3 shows the segmentation effect of different parts and extract features of femur. Figure 4 shows the effects of
of CT slices. Image is the original CT slice, Ground truth is our model for three femoral CT images.
the manually marked region of the femur, and Result is the
region of the femur segmented by 3D U-net.
Input
Upper
Output
mn i =0 j =0
(a)Image (b)Ground Truth (c)Result
()
Fig. 3. 3D U-net segmentation of the femur region results MAX I2
PSNR = 10 log10 ( )
MSE
For the evaluation of 3D U-net performance, four
()
evaluation indexes are used: Dice, Recall, Precision and F-
measure. The specific definitions are as follows: where I and K represent two images with m × n dimensions.
MSE is the mean square error, and MAXI represents the
maximum grayscale value of the image. The average of
PSNRs in all slices for individual cases with some typical D. Result analysis
patients are shown in Table 2. Cosine similarity is used to quantitatively evaluate the
inter-class difference and intra-class similarity in the
TABLE II. THE AVERAGE OF PSNR FOR INDIVIDUAL CASES
clustering results,
Case No. 1 2 3 4
n n n
sim ( x, y ) = xi yi x y
PSNR(dB) 34.234 35.876 34.621 34.761 2 2 ()
i i
i =1 i =1 i =1
Table 2 indicates that the PSNRs of outputs and inputs are 1 ()
both higher than 34dB by the Convolutional Autoencoder, SIM ( C1 , C2 ) =
N1 N 2
sim( x, y)
xC1 yC2
proving they are essentially similar. The experimental results
show there is a good feasibility in the feature extraction based where x and y represent the Eigenvectors of two different
on the 3-D Convolutional Autoencoder. images. Ci represents the categories generated by clustering,
and Ni is the number of samples contained in the Ci class.
C. Clustering
In the previous subsections, we accomplish the work of In order to quantitatively calculate the experimental results,
reducing dimensions and extracting features by the we select the slice data of the upper, middle and lower
Convolutional Autoencoder in the 3-D femoral data. segments of the femur to calculate the cosine similarity.
Compared with the raw data, the features can better reflect the Figure 7 shows the cosine similarity of K-means clustering
deep implicit information of data with 2048 dimensions. For results.
clustering, We first select a appropriate k with the Silhouette
c1 c1 0.96
Coefficient (SC),
c2 c2
()
0.88
SC = | a − b | max(a, b)
c3 c3 0.80
where a stands for the mean distance of different samples
c1 c2 c3 c1 c2 c3
within class, and b is the mean distance between a sample Upper Middle
0.72
within class and samples in other class. Figure 5 shows the 0.64
Silhouette Coefficients for different k values. c1 c1
0.56
c2 c2
0.48
c3 c3
0.40
c1 c2 c3 c1 c2 c3
Lower Average
(a) calculation results of inter-class similarity
0.93 0.92
0.91 0.90
0.87 0.85
c1 c2 c3 c1 c2 c3
Fig. 5. Silhouette Coefficients for different k values Upper Middle