You are on page 1of 5

A Classification method of the Osteonecrosis of

Femoral Head based on Deep Learning


Liyang Zhu, Jungang Han Shaojie Tang
Shaanxi Key Laboratory of Network Data Analysis and Shaanxi Key Laboratory of Network Data Analysis and
Intelligence Processing Intelligence Processing
School of Computer, Xi’an University of Posts & School of Automation, Xi’an University of Posts &
Telecommunications Telecommunications
Xi’an, China Xi’an, China
e-mail: zlyinter@sina.com email: tangshaojie@xupt.edu.cn

Wei Chai
Department of Orthopaedics
Chinese PLA General Hospital
Beijing, China
email: royal1860@sina.cn

Abstract—Aiming at the problem of unstable classification of (ARCO) in 1992, which includes the performance of
osteonecrosis of the femoral head, a classification method based Magnetic Resonance Imaging (MRI) and the assessment of
on deep learning is proposed. The 3D U-net is used to segment femoral head involvement degree [6]. There are also other
the CT data to obtain the complete femoral region. The several ONFH classification methods, such as Steinberg
Convolutional Autoencoder is used to reduce the dimension and classification [4], improved Kerboul classification [7],
feature extraction, and the deep information of the segmented Japanese Investigation Committee classification (JIC) [8] and
femur is obtained. Finally, K-means is used to cluster the China-Japan Friendship Hospital (CJFH) three pillars
extracted features and obtain the result. Experiments on 120 structure classification [9].
cases of patients shows that all of the 4 indexes, including Dice
coefficient, which are used to evaluate the segmentation effect, ONFH is divided into four stages according to its scope
are higher than 94%. The peak signal-to-noise ratio is adopted and the collapse degree of femoral head by using the staging
to evaluate the input and output image of Convolutional of ARCO [6]. ONFH is categorized into three groups by using
Autoencoder, and the result is higher than 34dB. Then, the the Steinberg classification based on the proportion of femoral
average inter-class similarity and intra-class similarity of the head necrosis in total femoral head volume [4], while it is
classification method are 51% and 87%, respectively, indicating divided into three categories by the improved Kerboul
that there are great differences between the types of femurs, and classification with the angle of the coronal and sagittal planes
there is a high similarity within the class. The proposed method
of femoral head [7]. In the JIC classification, femoral head is
has good clinical application value for individual treatment of
patients and can assist doctors to carry out more targeted
separated into three parts on the coronal plane, and ONFH is
treatment for patients with early osteonecrosis of the femoral grouped into four types according to the extent of necrosis
head. areas [8]. In addition, the CJFH classification divides the
femoral head into three parts on the coronal plane, and ONFH
Keywords—Osteonecrosis of Femoral Head, 3D U-net, has five categories according to the extent of necrosis areas
Convolutional Autoencoder, K-means Clustering [9]. Li et al. have observed the natural course of ONFH for
many years, concluding the high accuracy of CJFH
I. INTRODUCTION classification in predicting mild or severe ONFH and the low
Osteonecrosis of Femoral Head (ONFH) is a common accuracy on the middle extent of ONFH. As demonstrated in
complication in orthopaedics, which is probably caused by the Ref. [10], the consistency of JIC classification among
death of bone cells, bone marrow components and the different observers shows good performance, while those of
subsequent cell repairing reaction with the damage or Steinberg classification and improved Kerboul classification
interruption of femoral head blood supply. What’s more are relatively inferior. Even if the JIC classification
severe, it could lead to joint pain and joint dysfunction of demonstrates accurate results in predicting the ONFH, its
patients [1]. In addition, lots of other causes could result from accuracy on the moderate lesions is still low.
the ONFH, such as hip trauma, application of corticosteroids, The existing classification of osteonecrosis of femoral
alcoholism, decompression sickness, sickle cell anemia and so head has a certain degree of limitations, mainly depends on
on [1]. If a patient exists the symptom of articular surface the personal experience of doctors. It is not stable enough, and
collapse, physician usually has to replace the joint, which is the results cannot be detailed to understand the disease and
not only costly but also risky. Hence, it is particularly pathological changes. It also cannot predict the prognosis of
significant to research the classification of the early ONFHs patients accurately enough. However, there are few reports on
and develop the individualized treatment plans for patients. the combination of diagnosis of osteonecrosis of femoral head
In the previous works of ONFH classification, Marcus [2], and big data artificial intelligence.
Ficat and Arlet [3], Steinberg [4] and the University of Aiming at the instability of traditional classification
Pennsylvania [5] were all based on X-ray images and clinical methods, a new method based on deep learning is proposed in
symptoms with some limitations. Currently, a relatively ideal this paper:
staging method is the international standard of ONFH
provided by the Association Research Circulation Osseous

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


(1)The 3D U-net network is used to segment the CT B. Convolutional Autoencoder
data of the femur in order to obtain the complete information The autoencoder is a learning algorithm based on Back
of the femur and remove the interference of other tissues and Propagation (BP), which learns the identity function
organs in the CT section. In order to solve the problem of h w,b ( x)  x to make the targets close to inputs and effectively
insufficient memory and slow training caused by high
dimensionality of 3-D data, this paper proposes an encode the data. The input vector x is mapped from the
improvement in the data processing. The original 512 ×512 × mapping function f to the hidden layers y , formally:
128 3-D CT femoral data is randomly cropped into 64 × 64 × ni
64 3-D sub-data through the algorithm, and this data is used y = hw,b ( x) = f (W T x + b) = f (Wi xi + b) ()
for training. This method can effectively reduce the memory i =1
consumption and time during training.
ni is the number of Autoencoder layers, W is a weight matrix
(2)The Convolutional Autocoder is used to extract the with d ' d dimensions, and b is the bias vector,
reduced dimension feature of the segmented 3-D femoral f ( z) = 1/ (1 + exp(− z)) as the sigmoid activation function. The
information in order to obtain the deep implicit information of objective function for dataset {( x1 , y1) ,...,( xm , y m )} can be
the femoral region. In this paper, an improvement is proposed
formulated as:
in the coding part. Different convolution kernels are generally
used for convolution operation and then superimposed. This 1
 ()
m
can obtain refraction fields of different scales and extract J (W , b) = i =1
L( x i , y i )
m
multi-scale information between layers, which is beneficial to It is demonstrated that Convolutional Neural Networks
improve the accuracy of the encoding so that the original have shown excellent performance in the feature extraction of
information can be better compressed. images. In the experiments, we define the structure of 3-D
(3)We use K-means to cluster the data in order to get Convolutional Autoencoder. After training with the 3-D femur
the final classification results of osteonecrosis of the femoral data segmented by 3D U-net, we could obtain the deep feature
head. of raw data. As a comparison, this feature can better reflect the
implicit information of femur data with less dimensions.
II. RELATED WORK
The 3-D femur data segmented is used as the input of
A. 3D U-net Convolutional Autoencoder. To ensure the isotropy in images,
Recently, deep learning has developed rapidly in the field 3-D femur data are processed as 128×128×128. In the encoder
of image processing, and medical image segmentation is one layers, we take the convolution with 3×3×3 and 5×5×5
of the important contents of medical image research. convolution kernels respectively, adding them together to
Ronneberger et al. [11] propose U-Net network, which extract the multi-scale information of data. Meanwhile, we use
combines deep and shallow information through cascade 3×3×3 convolution kernels for the deconvolution layers, with
operation to achieve better segmentation effect. However, U- the stride convolution layers instead of pooling layers in the
Net network can not directly process 3-D medical images and decoder parts. In this way, the outputs are as consistent as
only cut 3-D images into multiple 2D slices before processing, possible with the inputs, and we can extract the mid-layer
which will lead to the loss of relevant information between the feature as the deep implicit information of raw data. The
middle layers of 3-D images, resulting in poor segmentation Convolutional Autoencoder network structure is shown as
details. Therefore, Çiçek et al. propose 3D U-net [12], which Figure 2.
can solve this problem and improve the accuracy of Input
128×128×128×1
Output
128×128×128×1
segmentation.
128 × 128 × 128 × 16 128 × 128 × 128 × 16
The network structure of 3D U-net is shown in Figure 1.
In the coding part, each layer contains two 3×3×3 convolution 64 × 64 × 64 × 32 64 × 64 × 64 × 32

operations, and after each convolution, a Rectified Linear 32 × 32 × 32 × 64 32 × 32 × 32 × 64


Unit(ReLU) is included. In the decoding section, each layer
16 × 16 × 16 × 128 16 × 16 × 16 × 128
consists of an upper convolution of 2×2×2. Each dimension
has a step size of 2, followed by two 3×3×3 convolutions, and 8 × 8 × 8 × 256 8 × 8 × 8 × 256

each of which is ReLU after convolution. At the last layer, the 4×4×4×512 4×4×4×512
1×1×1 convolution reduces the number of output channels to
2×2×2×1024 2×2×2×1024
the number of tags. In this paper, we set the number of tags to
2 in order to distinguish the femoral region from the 1 × 1 × 1 × 2048

background area.
Kernel=3×3×3
2 32 64 Strides=2
64+128 64 64 2
Conv
Kernel=3×3×3
Strides=2
Conv
64 128
128+256 128 Kernel=5×5×5
Strides=2
Conv

128 256 256+512 256 concat Fig. 2. Convolutional Autoencoder network structure
conv ( +BN ) + Relu
max pooling
256 up-conv
C. K-means Clustering
conv The clustering analysis divides objects into realistic
groups, which is helpful for further understanding and
Fig. 1. 3D U-net network architecture
description of data. The definition of cluster was first
proposed by Everitt in 1974, which requires that the objects in 2 X ÇY
the group should be relatively compact in space, and the Dice =
X +Y ()
spacing between any two objects in the group should be less
than that between any two objects each in a different group. TP
Recall =
K-means is an unsupervised algorithm based on the prototype. TP + FN
It evaluates the similarity using the distance between objects. ()
There is a greater similarity between closer objects, otherwise TP
a smaller similarity [13]. Precision =
TP + FP
The advantages of K-means clustering are mainly efficient, ()
concise, scalable and effective in the large scale of dataset. In F - measure = 2  Precision 
Recall
our work, K-means clustering is applied on the deep feature Precision + Recall
extracted by Convolutional Autoencoder, so we obtain an ()
unsupervised classification result for the early ONFH.Our where TP is the number of femoral area pixels correctly
experimental results below have shown that there is a great segmented, TN is the number of background pixels correctly
intra-class similarity and a significant inter-class difference. segmented. FP is the number of background pixels incorrectly
segmented into femoral areas, and FN is the number of
III. EXPERIMENT femoral pixels incorrectly labeled as background. X and Y
The femur CT data are provided by the Chinese PLA represent the segmented results by physicians and networks,
General Hospital contain 120 cases, with 128 CT slices respectively. We evaluate the average values for different
include for each case. The data are processed by batch parts of CT slices in Table 1.
desensitization, removal of the basic information of patients
such as name and identification number. The patients are TABLE I. 3D U-NET SEGMENTATION RESULT
45~65 years old. The 10 percent of the total data is randomly Region Dice Precision Recall F-measure
selected as the verification set to explore the best parameters Upper 0.9521 0.9645 0.9624 0.9518
for the model, then select 10 percent of the total data as the Middle 0.9544 0.9436 0.9411 0.9602
test set, while the remaining is used as the training dataset. Lower 0.9602 0.9589 0.9612 0.9601

A. Femoral segmentation
In our experiment, 3D U-net is used to segment the 3-D As shown in Table 1, the segmentation results of 3D U-net
CT slices of the femur. We adopt the Adam algorithm for the for the femoral region in CT slices are higher than 0.94 on
gradient descent (β1 = 0.5, β2 = 0.999), with the learning rate Dice, Recall, Precision and F-measure. The experimental
of 0.0002, mini-batch = 1, and the iterations of 300. We results show that the 3D U-net used in this paper can
implement our models with TensorFlow, and we train the effectively segment the femoral region of different slices.
models on a NVIDIA-DGX-1V machine equipped with 8 B. Feature extraction
Tesla V100 GPUs, each graphic card with 512 GB memory.
Convolutional Autoencoder is used to reduce dimensions
Figure 3 shows the segmentation effect of different parts and extract features of femur. Figure 4 shows the effects of
of CT slices. Image is the original CT slice, Ground truth is our model for three femoral CT images.
the manually marked region of the femur, and Result is the
region of the femur segmented by 3D U-net.
Input

Upper

Output

Upper Middle Lower


Middle
Fig. 4. The effects of the proposed 3-D Convolutional Autoencoder for
three femoral CT images

We use the Peak Signal-to-Noise Ratio (PSNR) to


evaluate the similarity between input and output of the
Lower Convolutional Autoencoder.
1 m−1 n −1
MSE =   I (i, j ) − K (i, j )
2

mn i =0 j =0
(a)Image (b)Ground Truth (c)Result
()
Fig. 3. 3D U-net segmentation of the femur region results MAX I2
PSNR = 10  log10 ( )
MSE
For the evaluation of 3D U-net performance, four
()
evaluation indexes are used: Dice, Recall, Precision and F-
measure. The specific definitions are as follows: where I and K represent two images with m × n dimensions.
MSE is the mean square error, and MAXI represents the
maximum grayscale value of the image. The average of
PSNRs in all slices for individual cases with some typical D. Result analysis
patients are shown in Table 2. Cosine similarity is used to quantitatively evaluate the
inter-class difference and intra-class similarity in the
TABLE II. THE AVERAGE OF PSNR FOR INDIVIDUAL CASES
clustering results,
Case No. 1 2 3 4
n n n
sim ( x, y ) =  xi yi x y
PSNR(dB) 34.234 35.876 34.621 34.761 2 2 ()
i i
i =1 i =1 i =1
Table 2 indicates that the PSNRs of outputs and inputs are 1 ()
both higher than 34dB by the Convolutional Autoencoder, SIM ( C1 , C2 ) =
N1 N 2
  sim( x, y)
xC1 yC2
proving they are essentially similar. The experimental results
show there is a good feasibility in the feature extraction based where x and y represent the Eigenvectors of two different
on the 3-D Convolutional Autoencoder. images. Ci represents the categories generated by clustering,
and Ni is the number of samples contained in the Ci class.
C. Clustering
In the previous subsections, we accomplish the work of In order to quantitatively calculate the experimental results,
reducing dimensions and extracting features by the we select the slice data of the upper, middle and lower
Convolutional Autoencoder in the 3-D femoral data. segments of the femur to calculate the cosine similarity.
Compared with the raw data, the features can better reflect the Figure 7 shows the cosine similarity of K-means clustering
deep implicit information of data with 2048 dimensions. For results.
clustering, We first select a appropriate k with the Silhouette
c1 c1 0.96
Coefficient (SC),
c2 c2
()
0.88
SC = | a − b | max(a, b)
c3 c3 0.80
where a stands for the mean distance of different samples
c1 c2 c3 c1 c2 c3
within class, and b is the mean distance between a sample Upper Middle
0.72

within class and samples in other class. Figure 5 shows the 0.64
Silhouette Coefficients for different k values. c1 c1
0.56

c2 c2
0.48

c3 c3
0.40

c1 c2 c3 c1 c2 c3
Lower Average
(a) calculation results of inter-class similarity
0.93 0.92
0.91 0.90
0.87 0.85

c1 c2 c3 c1 c2 c3
Fig. 5. Silhouette Coefficients for different k values Upper Middle

The SC value is maximum with k=3. So, we take k=3. 0.88


0.92
0.84 0.85 0.84
0.91

Then, K-means algorithm is used to cluster the features, and


adapte Principal Component Analysis (PCA) to reduce
dimensions for display. The clustering results are shown in
Figure 6.
c1 c2 c3 c1 c2 c3
Lower Average
(b) calculation results of intra-class similarity
Fig. 7. Cosine similarity calculation result

After calculation, the average cosine similarity between


classes is 0.51, which indicates that there is a big difference
between the inter-class data; The average cosine similarity in
the class is 0.87, which indicates that the intra-class data has
high similarity.
E. Comparison and Analysis of different algorithms
Fig. 6. The clustering results of K-means (k=3) In order to verify the effectiveness of the proposed method,
the density-based method DBSCAN and the hierarchical
method BIRCH are used to cluster the femoral data after
image segmentation and feature extraction, and the
corresponding cosine similarity is calculated. The results of
clustering the raw data and performing corresponding and hopefully it can assist physicians to apply more
calculations are compared, and the results are shown in Table individualized treatment to patients.
3.
Until present, the application of AI for the auxiliary
TABLE III. COMPARISON OF COSINE SIMILARITY RESULTS OF diagnosis of ONFH is still in the primary stage, and the work
DIFFERENT CLUSTERING METHODS reported in this paper is a preliminary exploration in this field.
Traditional classifications rely much more on the experiences
Inter-class Intra-class
Method
similarity similarity of physicians, while the approach proposed in this paper is
DBSCAN 0.63 0.75 based on deep learning, which is relatively objective as
BIRCH 0.61 0.78 compared to those traditional classifications. Currently, we are
K-means on raw data 0.74 0.77 carrying out a comparison research with clinical classification,
Proposed Method 0.51 0.87 and the results will be reported in the near future.
ACKNOWLEDGMENT
As can be seen from Table 3, compared with other
algorithms, the proposed method is superior to other This work was supported in part by the project for the
algorithms in inter-class similarity and intra-class similarity of innovation and entrepreneurship in Xi’an University of Posts
femoral data; The final classification results can be greatly and Telecommunications, China (2018SC-03), by the Key
optimized by segmentation and feature extraction of femoral Lab of Computer Networks and Information Integration
data. Figure 8 shows the classification difference of the femur (Southeastern University), Ministry of Education, China
obtained by proposed method: (K93-9-2017-03), by the Department of Education Shaanxi
Province, China (15JK1673), and by Shaanxi Provincial
In the first class, the bone distribution is more uniform, and Natural Science Foundation of China (2016JM8034).
the bone is relatively full, with no obvious black area;
REFERENCES
ln the second class, the bone distribution is relatively
uniform, with part of the black area; [1] Sugano, Nobuhiko, et al. "Prognostication of nontraumatic avascular
necrosis of the femoral head. Significance of location and size of the
In the third class, the bone distribution is uneven. There is necrotic lesion." Clinical orthopaedics and related research 303 (1994):
155-164.
an obvious black area, and the boundary with the white area is
obvious. [2] Marcus, Neal D., W. F. Enneking, and Robert A. Massam. "The silent
hip in idiopathic aseptic necrosis: treatment by bone-grafting." JBJS
55.7 (1973): 1351-1366.
[3] Ficat, R. P. "Idiopathic bone necrosis of the femoral head. Early
diagnosis and treatment." The Journal of bone and joint surgery. British
volume 67.1 (1985): 3-9.
[4] Steinberg M E, Hayken G D, Steinberg D R. A quantitative system for
staging avascular necrosis. The Journal of Bone and Joint Surgery.
British volume, 1995, 77-B (1): 34-41.
[5] Steinberg, David R., and Marvin E. Steinberg. "The University of
Pennsylvania classification of osteonecrosis." Osteonecrosis. Springer,
Berlin, Heidelberg, 2014. 201-206.
[6] Gardeniers, J. W. "A new international classification of osteonecrosis
of the ARCO-committee on terminology and classification." ARCO
news 4 (1992): 41-46.
[7] Kerboul, M., et al. "The conservative surgical treatment of idiopathic
aseptic necrosis of the femoral head." The Journal of bone and joint
surgery. British volume 56.2 (1974): 291-296.
[8] Ono, K. "Diagnostic criteria, staging system, and roentgenographic
classification of avascular necrosis of the femoral head (steroid induced,
(a) Class1 (b) Class 2 (c) Class 3 alcohol associated, or idiopathic nature)." Annual report of Japanese
Fig. 8. Final classification result map investigation committee for intractable diseases, avascular necrosis of
the femoral head, under the auspices of Ministry of Health and Welfare
331 (1987).
IV. DISCUSSIONS AND CONCLUSION
[9] Zirong, L. I., et al. "The classification of osteonecrosis of the femoral
In this paper, we propose an automatic approach for head based on the three pillars structure: China Japan Friendship
classification of the early ONFH with deep learning. First of Hospital (CJFH) classification." Chinese Journal of Orthopaedics 32.6
all, the respective 3D U-net is used to sequentially segment (2012): 515-520.
the femoral areas with CT data and the experimental results [10] Takashima, Kazuma, et al. "Which classification system is most useful
prove that there is an accuracy higher than 94% with the for classifying osteonecrosis of the femoral head?." Clinical
Orthopaedics and Related Research 476.6 (2018): 1240-1249.
proposed approach. The segmentation method can obtain the
[11] Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net:
regional details of femur data accurately. Secondly, we Convolutional networks for biomedical image segmentation."
propose a 3-D Convolutional Autoencoder to extract the deep International Conference on Medical image computing and computer-
features of femur, so a drastic reduction in dimensions is assisted intervention. Springer, Cham, 2015.
achieved from 128×128×128 to 2048. PSNR is employed for [12] Çiçek, Özgün, et al. "3D U-Net: learning dense volumetric
evaluating the similarity of inputs and outputs, and the result segmentation from sparse annotation." International conference on
is higher than 34dB. Finally, through K-means algorithm, we medical image computing and computer-assisted intervention.
Springer, Cham, 2016.
cluster the deep features extracted into three groups, and the
[13] Zhang, Tian, Changchuan Yin, and Lin Pan. "Improved clustering and
results demonstrate there is a great intra-class similarity and a association rules mining for university student course scores." 2017
significant inter-class difference. The proposed approach has 12th International Conference on Intelligent Systems and Knowledge
a valuable clinical merit in classification of the early ONFH, Engineering (ISKE). IEEE, 2017.

You might also like