Professional Documents
Culture Documents
Radiographic Images
BPNN has an accuracy of 94.3% with 66 images detected from 70 12173 patients with a total of 40561 X-ray images with two labels,
images of fractured bones. normal and abnormal. The types of bone images are humerus, hand,
Chung S. W. [4] also demonstrated the very high performance forearm, finger, wrist, shoulder, and elbow. The MURA dataset
of the Deep Convolutional Neural Network (DCNN) to detect and was used for two-phase classification, bone types classification and
classify Proximal Humerus fracture. They used a total of 1891 abnormalities classification [6]
dataset images, of which 1376 fractured images and 515 normal In addition to MURA dataset, this study also used ImageNet as
images. The training dataset had 1702 images, while the testing the pre-trained weight for each model. ImageNet is an image
dataset had 189 images. DCNN accuracy reached 96% with a database organized according to the WordNet hierarchy (currently
sensitivity of 99% and specificity of 97%. only the nouns), in which each node of the hierarchy is depicted by
Another work using a Convolutional Neural Network was also hundreds and thousands of images [7]
implemented by Yahalomi E [5] who developed a machine vision
neural network called Faster R-CNN to detect and classify wrist 3.2 Image Pre-processing
bone fractures. The datasets are 55 images of anteroposterior Firstly, the images were labelled based on their abnormalities
fracture, 40 normal images without anteroposterior fracture, and 25 (positive for abnormal and negative for normal). Therefore, there
additional other types of bone images which served as the negative were two classes configured as the output of the image. Using the
label. The pretraining dataset had 96 images and the training dataset OpenCV library, the images were then read as RGB matrices and
had 24 images. Faster R-CNN has an accuracy of 96%. resized into 224 x 224 pixels. The values of the pixels were
normalized by dividing them by 255.
This paper proposes and evaluates a bone abnormalities MURA dataset provides training and validation datasets, but
detection system with a larger number of datasets to help we used the validation dataset as the test dataset instead. The
radiologists minimize the risk of misdiagnoses and any severe training dataset was split randomly into both training and validation
damage. With the use of a Convolutional Neural Network for bone sets with a ratio of 85% and 15% respectively.
type classification and six different deep learning framework
approaches for abnormalities classification, we attempt to measure 3.3 Bone types classification
and compare the performance of each framework to pick the best
To classify the types of bone in the image before moving
possible result for the bone abnormalities detection system.
further to classifying its abnormalities, the standard Convolutional
2 Proposed Bone Abnormalities Classification Neural Network was used. In this method, the image was processed
System into multiple convolutions and the features are obtained.
The workflow in Figure 1 is the proposed bone abnormality Convolutional Neural Network (CNN) is a Neural Network,
classification system for radiographic images. consisting of an input layer, output layer, and multiple hidden
layers. The hidden layers of CNN typically consist of convolutional
layers, pooling layers, fully connected layers, and normalization
layers [8]. The convolutional layer computes the convolutional
operation for the input image and kernel filter to get features of the
Figure 1: Bone abnormalities detection system workflow input image [9]. The kernel filter is a matrix that contains constant
In the first stage, the bone image input is received by the parameters and has a smaller size than the input image. The pooling
system. The machine learning model will classify the type of bone layer shrinks or reduces the dimensionality of the extracted features
using a Convolutional Neural Network (CNN). Seven possible of the input image and retains the important information. The
types of bone are classified: Humerus, Elbow, Wrist, Shoulder, normalization layer (ReLU) changes all negative values of the
Finger, Forearm and Hand. The bone type is the input for the filtered image to zero [8]. The convolution, pooling, and
system to choose which model to further process the image. There normalization layers are stacked, so the output of one layer
are several machine learning models used in our research, namely, becomes the input of the next layer and this stacked layer can be
DenseNet (201, 169, 121), ResNet, Inception and VGG. Every repeated. The fully connected layer means connecting each node in
model has seven different models for each type of bone. one layer to every node in the next layer. At the end of the network,
The input image is processed by the models to determine there is a classifier which is determining the image classification.
whether an abnormality is present. The output of the system then The CNN architecture is displayed in Figure 2.
produces the output of the bone type and its abnormality.
3 Bone Abnormalities Classification Method
3.1 Dataset
Our research used MURA dataset which is available publicly
intended for a deep learning competition held by the Machine
Learning group of Stanford University. This dataset has not been
reviewed by authoritative sources or organizations and is used for Figure 2: CNN Architecture [10]
research purposes only. MURA consists of 14863 studies from
Bone Abnormality Classification using Deep Learning on
IC3INA’22, November, 2022, Bandung, Indonesia
Radiographic Images
3.4.3 ResNet50
Figure 3: The model training process Residual Network (ResNet) is a CNN model with a residual
building block (RBB) that increases the benefit of solving
3.4.1 VGG-16 complicated tasks and increasing detection accuracy [15]. This
VGG-16 is one of the most popular CNN models used for model was proposed by Kaiming He in 2015. The idea of RBB is
image classification. This model was proposed by Karen Simonyan the skipping block of the convolutional layer by using a shortcut
and Andrew Zisserman in 2013. VGG-16 model consists of 16 connection. RBB consists of several convolutional and batch
layers which are 13 layers of convolutional using a 3x3 kernel filter normalization (BN) layers [16]. The RBB has shown in Figure 6.
with a pooling layer at the end of each stack, 2 fully connected
layers and, one classifier [11]. The precise architecture of the VGG-
16 models which is used to classify bone fracture in this work
shown in Figure 4.
4.2 Accuracy
The accuracy measured is the ratio of the correctly predicted
images (the sum of true positives and two negatives) to the total
number of datasets.
5 Result
All the results of this model evaluation for bone fracture
detection are provided by TensorFlow 2.0 platform with Keras API Figure 8: Model testing results comparison
and Scikit-Learn under Python 3. The testing accuracy of each model for each type of bone
yielded different findings. The comparison is displayed in Figure 8.
5.1 Dataset The hand type has the lowest average accuracy of all the bone types,
with the DenseNet121 model with the highest accuracy of 66.96%
Table 1. Datasets Distribution
and the lowest accuracy of 58% from Inception.
Object Total Training Validation Testing Other than the hand bone type, the VGG model shows a
Data
noticeable good performance in classifying bone abnormalities.
Humerus 1272 850 150 272 The model achieves the highest accuracy results on every other
Elbow 4931 3825 675 431 bone type with the humerus type having the highest accuracy of
Wrist 9739 8278 1461 656 81.62%. The average accuracy of VGG model across all of the bone
Shoulder 8364 7109 1255 563 types achieves 73.29%. This is significantly higher than other
Finger 5106 4340 766 461 models which have less than 70%.
Forearm 1825 1551 247 301 All DenseNet networks and ResNet appear to show a relatively
Hand 5543 4711 832 460 similar performance. They are averaging at around 65% for all
types of bones, but the peak accuracy of the models was on different
For bone type classification, CNN training used 28050 data for
types of bone. In comparison to other models, Inception always
training and 4950 for validation while the testing used 3808 data.
gives the least number of true classifications in all types of bones.
Bone Abnormality Classification using Deep Learning on
IC3INA’22, November, 2022, Bandung, Indonesia
Radiographic Images
The average accuracy this model achieved for every bone type is ACKNOWLEDGMENTS
only 50.5%.
This study is a part of Smart Direct Digital Radiography research
in Research Center for Electronics at National Research and
Innovation Agency.
REFERENCES
[1] H. R. Guly, “Diagnostic errors in an accident and emergency department,”
Emergency Medicine Journal, vol. 18, no. 4, pp. 263–269, Jul. 2001, doi:
10.1136/emj.18.4.263.
[2] A. Mathew, P. Amudha, and S. Sivakumari, “Deep Learning Techniques: An
Overview,” 2021, pp. 599–608. doi: 10.1007/978-981-15-3383-9_54.
[3] K. Dimililer, “IBFDS: Intelligent bone fracture detection system,” Procedia
Comput Sci, vol. 120, pp. 260–267, 2017, doi: 10.1016/j.procs.2017.11.237.
[4] S. W. Chung et al., “Automated detection and classification of the proximal
humerus fracture by using deep learning algorithm,” Acta Orthop, vol. 89, no.
4, pp. 468–473, Jul. 2018, doi: 10.1080/17453674.2018.1453714.
[5] E. Yahalomi, M. Chernofsky, and M. Werman, “Detection of distal radius
fractures trained by a small set of X-ray images and Faster R-CNN,” Dec. 2018.
[6] P. Rajpurkar et al., “MURA: Large Dataset for Abnormality Detection in
Musculoskeletal Radiographs,” Dec. 2017.
[7] J. Deng, W. Dong, R. Socher, L.-J. Li, Kai Li, and Li Fei-Fei, “ImageNet: A
large-scale hierarchical image database,” in 2009 IEEE Conference on
Computer Vision and Pattern Recognition, Jun. 2009, pp. 248–255. doi:
10.1109/CVPR.2009.5206848.
[8] M. Hussain, J. J. Bird, and D. R. Faria, “A Study on CNN Transfer Learning for
Image Classification,” 2019, pp. 191–202. doi: 10.1007/978-3-319-97982-3_16.
[9] S. A. Singh, T. G. Meitei, and S. Majumder, “Short PCG classification based on
Figure 9: Evaluation of the model with confusion matrices deep learning,” in Deep Learning Techniques for Biomedical and Health
Informatics, Elsevier, 2020, pp. 141–164. doi: 10.1016/B978-0-12-819061-
6.00006-9.
Figure 9 shows the evaluation of the model with confusion [10] Inc. The MathWorks, “What is a Convolutional Neural Network? - MATLAB
matrices. Each matrix combines the result of all types of bones’ & Simulink,” Sep. 30, 2022.
abnormality classification so that it shows the overall performance [11] S. Tammina, “Transfer learning using VGG-16 with Deep Convolutional Neural
of the model. From the matrix color, all models’ performances are Network for Classifying Images,” International Journal of Scientific and
Research Publications (IJSRP), vol. 9, no. 10, p. p9420, Oct. 2019, doi:
generally similar except for the Inception model. The Inception 10.29322/IJSRP.9.10.2019.p9420.
model has the most false positives which we aim to avoid. From [12] C. Szegedy et al., “Going deeper with convolutions,” in 2015 IEEE Conference
the previous accuracy result, we see that VGG shows the best result, on Computer Vision and Pattern Recognition (CVPR), Jun. 2015, pp. 1–9. doi:
10.1109/CVPR.2015.7298594.
but DenseNet169 has the least number of false positives. However, [13] Z. Dongmei, W. Ke, G. Hongbo, W. Peng, W. Chao, and P. Shaofeng,
all the models still have over 20% of false positives so it is still “Classification and identification of citrus pests based on InceptionV3
needed to be improved. convolutional neural network and migration learning,” in 2020 International
Conference on Internet of Things and Intelligent Applications (ITIA), Nov.
2020, pp. 1–7. doi: 10.1109/ITIA50152.2020.9312359.
6 Conclusion [14] M. Tripathi, “Analysis of Convolutional Neural Network based Image
Classification Techniques,” Journal of Innovative Image Processing, vol. 3, no.
The proposed bone abnormalities classification system was 2, pp. 100–117, Jun. 2021, doi: 10.36548/jiip.2021.2.003.
able to produce output in the form of classification results for bone [15] I. Z. Mukti and D. Biswas, “Transfer Learning Based Plant Diseases Detection
Using ResNet50,” in 2019 4th International Conference on Electrical
type and bone abnormality. The bone type classification using CNN Information and Communication Technology (EICT), Dec. 2019, pp. 1–6. doi:
produces a good result, while the bone abnormality classification 10.1109/EICT48899.2019.9068805.
shows varied results. The comparison of the bone abnormality [16] L. Wen, X. Li, and L. Gao, “A transfer convolutional neural network for fault
diagnosis based on ResNet-50,” Neural Comput Appl, vol. 32, no. 10, pp. 6111–
classification models shows that the VGG model has the highest 6124, May 2020, doi: 10.1007/s00521-019-04097-w.
accuracy in almost all bone types while the Inception model has the [17] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely
lowest accuracy in all bone types. However, they still give high Connected Convolutional Networks,” in 2017 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), Jul. 2017, pp. 2261–2269. doi:
false positive results, therefore it needs to be improved. 10.1109/CVPR.2017.243.
The datasets used in this research comprised different types of [18] J. Zhang, C. Lu, X. Li, H.-J. Kim, and J. Wang, “A full convolutional network
abnormalities. Therefore, in the future, more studies are necessary based on DenseNet for remote sensing scene classification,” Mathematical
Biosciences and Engineering, vol. 16, no. 5, pp. 3345–3367, 2019, doi:
to evaluate the optimal integration of this model and other deep 10.3934/mbe.2019167.
learning models in a clinical setting. Also, attention maps may need
to be implemented so that the network can focus on the abnormal
features.