You are on page 1of 8

Journal of Physics: Conference Series

PAPER • OPEN ACCESS You may also like


- The relationship between pneumonia and
Pneumonia identification based on lung texture Glasgow coma scale assessment on acute
stroke patients
analysis using modified k-nearest neighbour K Ritarwan, C A Batubara and R Dhanu

- Exhaled breath condensate biomarkers in


critically ill, mechanically ventilated
To cite this article: S Kana Saputra et al 2022 J. Phys.: Conf. Ser. 2193 012070 patients
Michael D Davis, Brett R Winters, Michael
C Madden et al.

- Detecting Pneumonia Lung Infection From


X-Ray Images with Deep Learning
View the article online for updates and enhancements. K Raheja, A Goel and M Mahajan

This content was downloaded from IP address 110.137.36.115 on 10/11/2023 at 06:45


ICOSTA 2021 IOP Publishing
Journal of Physics: Conference Series 2193 (2022) 012070 doi:10.1088/1742-6596/2193/1/012070

Pneumonia identification based on lung texture analysis using


modified k-nearest neighbour

Kana Saputra S1, Insan Taufik2, Mhd Hidayat3, Dinda Farahdilla Dharma4
1,2,3,4
Study Program of Computer Science, Faculty of Mathematics and Natural Sciences,
Universitas Negeri Medan, North Sumatera, Indonesia

kanasaputras@unimed.ac.id; insantaufik@unimed.ac.id;
mhdhidayat@mhs.unimed.ac.id; dindafarahdilla@mhs.unimed.ac.id

Abstract. Covid-19 is a virus that was first discovered in China, which has the impact of mild
and severe respiratory infections such as pneumonia. Pneumonia is inflammation and
consolidation of lung tissue due to infectious agents. Generally pneumonia has a high mortality
rate, as do Covid-19 patients. For now, it is very difficult to distinguish between Pneumonia and
Covid-19, due to the high similarity of X-Ray image results. The high similarity has an impact
on the difficulty of difference between Pneumonia and Covid-19 patients. This research aims to
be able to different Pneumonia and Covid-19 patients based on texture analysis of the Gray Level
Co-Occurrence Matrix using Modified k-Nearest Neighbour as a classifier. The calculations used
in the Gray Level Co-Occurrence Matrix method are Contrast, Correlation, Energy, and
Homogeneity which will be input for the Modified k-Nearest Neighbour classifier. The results
showed that the highest accuracy is when the value of K = 3 using Manhattan Distance and
80%:20% data percentage, which is 87.5%. For the values of K = 7 and K = 9 there is no change
in accuracy, so it can be concluded that the value of K that affects accuracy only occurs at the
values of K = 3 and K = 5. Then, the higher the K value, the lower the resulting accuracy.

1. Introduction
The world is currently experiencing one of the biggest health disasters in human history, we called
Corona Virus. Coronavirus Disease 2019 (Covid-19) was first discovered in Hubei Province, China
through reports of the type of Pneumonia whose cause is not yet known [1]. Based on data from the
World Health Organization (WHO) on March 11, 2021, confirmed cases of Covid-19 reached
117,799,584 cases and 2,615,018 deaths spread across 223 countries. For Indonesia, based on data from
the Covid-19 Task Force on March 11, 2021, it was stated that the confirmed positive cases of Covid-
19 reached 1,403,722 cases and 38,049 deaths. The impact of Covid-19 not only causes mild respiratory
infections but can cause severe respiratory infections such as pneumonia and even death [2].
The disease caused by Covid-19 has very similar symptoms to pneumonia. However, a disease that
causes inflammation of the lungs due to pathogenic infection is called Pneumonia [3]. Pneumonia caused
by Covid-19 is slightly different from pneumonia that generally occurs. This inflammation can cause
the air sacs in the respiratory tract of the lungs to become infected and fill with fluid. However,
pneumonia can heal by itself if the patient's immune system is good. For Covid-19, this disorder
generally attacks the upper respiratory tract which can eventually spread to the lungs. Covid-19 can
infect the upper respiratory tract and cause blockages in the respiratory organs [4]. This study aims to

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
ICOSTA 2021 IOP Publishing
Journal of Physics: Conference Series 2193 (2022) 012070 doi:10.1088/1742-6596/2193/1/012070

be able to distinguish Pneumonia and Covid-19 patients based on texture analysis of the Gray-Level Co-
Occurrence Matrix (GLCM) using Modified k-Nearest Neighbour (MKNN) as a classifier.
MKNN method is one of the classifier methods used for classification. Research on the application
of MKNN has been carried out for the classification of the function of active compounds based on the
Simplified Molecular Input Line Entry System (SMILES) code with the test results getting an accuracy
value of 73% with a value of K = 3 [5]. In addition, MKNN method has also been implemented for the
classification of hoax news in the presidential election by calculating the Precision value of 93.75%,
Recall value of 90.90%, and accuracy value of 92.30% [6]. For GLCM method has been applied for
feature extraction of cervical cancer colposcopy data and uses a support vector machine as a classifier
with an accuracy of 90% [7]. Some of these studies show that GLCM as feature extraction and MKNN
is quite good for use as a classifier.
Based on that explanation, this research will apply MKNN to identify lungs of Covid-19 patients and
lungs of pneumonia patients. Feature extraction using texture analysis approach with GLCM method.
The feature uses the calculation of Contrast, Correlation, Energy, and Homogeneity which will be the
input for the MKNN classifier.

2. Research Method
The stages in this research as shown in Figure 1.

Data Collection

Data Preprocessing

Data
Enhancement

Feature Extraction
(Gray Level Co-
occurrence Matrix)

Data

Data Splitting

Training Data Testing Data

Modified KNN

Training and Testing

Model

Analysis and Evaluation

Figure 1. Research Stages.

2.1. Data Collection


The data used is an X-Ray Lung image and is entered into public (open) data. X-Ray lung images were
obtained from a previous research (https://github.com/muhammedtalo/COVID-19) [8]. The amount of
data used is 200 images, consisting of 100 Covid-19 X-Ray images and 100 Pneumonia X-Ray images.

2.2. Data Pre-processing


At this stage, image quality improvement is carried out using the Contrast Limited Adaptive Histogram
Equalization (CLAHE) method. CLAHE is a generalization of the Adaptive Histogram Equalization
(AHE) method. The method can produce images that much better than the image of the original that has
not been processed [9]. The performance of the CLAHE method is better than the High-Frequency

2
ICOSTA 2021 IOP Publishing
Journal of Physics: Conference Series 2193 (2022) 012070 doi:10.1088/1742-6596/2193/1/012070

Emphasis (HFE) method in improving image quality [10]. The stages of the CLAHE method are as
follows [9], [11]:

Step 1 : The original image is divided into several sub-images measuring M x N


Step 2 : Calculate the histogram of each sub-image
Step 3 : Clipped histogram form of each image.
The number of pixels in each sub-image is distributed at each gray level. The
average number of pixels in each degree of gray is formulated in Equation (1).
𝑁𝑆𝐼−𝑋𝑃 𝑋 𝑁𝑆𝐼−𝑌𝑃
𝑁𝑎𝑣𝑔 = 𝑁𝑔𝑟𝑎𝑦𝑙𝑒𝑣𝑒𝑙
(1)

Based on Equation (1), clip-limit can be calculated using Equation (2).


𝑁𝐶−𝐿 = 𝑁𝑐 𝑋 𝑁𝑎𝑣𝑔 (2)
𝑁𝐶−𝐿 = actual clip-limit
𝑁𝑐 = the maximum pixel average value of each sub-image gray level value

In the original histogram, pixels will be clipped if the number of pixels is greater
than 𝑁𝑐 . The number of pixels is evenly distributed into each degree of gray (𝑁𝑑 )
which is defined by the number of pixels clipped 𝑁𝑇𝐶 formulated in Equation (3).
𝑁𝑇𝐶
𝑁𝑑 = 𝑁 (3)
𝑔𝑟𝑎𝑦𝑙𝑒𝑣𝑒𝑙

𝐻𝑆𝐼 (𝑖) is the number of pixels in each sub-image gray level and "i" is the number
of gray levels. By using Equation (3), histogram contrast limited sub-images can be
calculated using Equation (4).
𝑖𝑓 𝐻𝑆𝐼 > 𝑁𝐶−𝐿 , 𝐻𝑁𝑆𝐼 (𝑖) = 𝑁𝐶−𝐿
𝐸𝑙𝑠𝑒 𝑖𝑓 𝐻𝑆𝐼 (𝑖) + 𝑁𝑑 ≥ 𝑁𝐶−𝐿 , 𝐻𝑁𝑆𝐼 (𝑖) = 𝑁𝐶−𝐿 (4)
𝐸𝑙𝑠𝑒 𝐻𝑁𝑆𝐼 (𝑖) = 𝐻𝑆𝐼 (𝑖) + 𝑁𝑑

At the end of the distribution in Equation (4), the remaining number of cut pixels is
expressed as 𝑁𝑅𝑃 , the pixel distribution stage is formulated in Equation (5).
𝑁𝑔𝑟𝑎𝑦𝑙𝑒𝑣𝑒𝑙
𝑆= 𝑁𝑅𝑃
(5)

This method scans all pixels from the minimum to the maximum gray-level value.
If the gray-level pixel frequency is less than 𝑁𝐶−𝐿 , this method distributes a one-
pixel gray-level value. If the search ends before the distribution of all pixels, it will
be calculated again based on Equation (5) and start a new search until all pixels are
distributed. Thus a new histogram will be obtained.
Step 4 : Limited contrast histogram of each sub-image processed with HE. Next, the pixels
of the sub-images are mapped using linear interpolation.

2.3. Feature Extraction


Feature extraction is done to retrieve the features that exist in objects in the image. This research use the
Gray Level Co-occurrence Matrix (GLCM) method to perform feature extraction based on texture
analysis. This method will calculate the Contrast, Correlation, Energy, and Homogeneity values [12].
There are several characteristics proposed by Robert Haralick, i.e. [13], [14]:

3
ICOSTA 2021 IOP Publishing
Journal of Physics: Conference Series 2193 (2022) 012070 doi:10.1088/1742-6596/2193/1/012070

1. Contrast is to measure the value of the difference in intensity in the image. Contrast can be
calculated using Equation (6).
𝐶𝑜𝑛𝑡𝑟𝑎𝑠𝑡 = ∑𝑛=1 𝑛2 {∑𝑖 ∑𝑗 𝑃(𝑖, 𝑗)}
(6)

2. Correlation is to measure the linear value of the gray degree of neighbors pixels in the gray
image. Correlation can be calculated using Equation (7).
∑𝑖 ∑𝑗(𝑖,𝑗)(𝑃(𝑖,𝑗))−(𝜇𝑖 ′𝜇𝑗 ′)
𝐶𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 =
𝜎𝑖 ′𝜎𝑗 ′
(7)

3. Energy (Inverse Different Moment) is to measure the concentration value of the pair in the co-
occurrence matrix. Energy can be calculated using Equation (8).
𝑃(𝑖,𝑗)
𝐼𝐷𝑀 = ∑𝑖 ∑𝑗 1+(𝑖−𝑗)2

(8)

4. Homogeneity (Angular Second Moment) is used to measure the similarity value of intensity
variations in the image. Homogeneity can be calculated using Equation (9).
𝐴𝑆𝑀 = ∑𝑖 ∑𝑗{𝑃(𝑖, 𝑗)}2
(9)

2.4. Data Splitting


The entire feature extraction data is divided into training data and testing data. The percentage of training
data and testing data are 75%:25%, 80%:20%, 85%:15%, and 90%:10% [15]. Details of data splitting
shown in Table 1.
Table 1. Detail Data Splitting.
75%:25% 80%:20% 85%:15% 90%:10%
Data
Training Testing Training Testing Training Testing Training Testing
Covid-19 77 23 82 18 87 13 90 10
Pneumonia 73 27 78 22 83 17 90 10
Total 150 50 160 40 170 30 180 20

2.5. Implementation Modified K- Nearest Neighbour


Modified k-Nearest Neighbour (MKNN) algorithm placing a class label from the data according to the
k value which is calculated based on the validity calculation for all data contained in the training data.
Furthermore, the calculation of the weighted voting for all training data using validity. The validity of
each data on the training data is multiplied by the weighted based on the Euclidian and Manhattan
distance [16]. The existence of validation on training data can produce good accuracy results value [17].
This research uses an odd k value, is 3 [18], 5 [19], 7 [20], and 9 [21].

2.6. Analysis and Evaluation


Analysis and evaluation are used to determine the accuracy of the proposed algorithm model [22]. To
get the calculation of the accuracy value, this research uses a confusion matrix. The confusion matrix is
a table that is used for the performance of the classification model (classifier) on a set of test data that is
known [23]. Calculation of accuracy using Equation (14) [24].

True Positive+True Negative


𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = True Positive+True Negative+False Positive+False Negative 𝑥100% (10)

4
ICOSTA 2021 IOP Publishing
Journal of Physics: Conference Series 2193 (2022) 012070 doi:10.1088/1742-6596/2193/1/012070

3. Result and Analysis


To make the classifier easy to identify Pneumonia and Covid-19, it is necessary to do the data pre-
processing. This research improves image quality by using the Contrast Limited Adaptive Histogram
Equalization (CLAHE) method. Image enhancement results can be seen in Figure 2.

(a) Before Enhancement (b) After Enhancement


Figure 2. Image Enhancement using CLAHE Method.

The amount of data used is 200 images, which consists of 100 Covid-19 X-Ray Images and 100
Pneumonia X-Ray Images. After the image enhancement using the CLAHE method, the next step is to
perform feature extraction using the GLCM method. The results of feature extraction using the GLCM
method shown in Table 2.
Table 2. Feature Extraction Results Using the GLCM Method.
No Contrast Correlation Energy Homogeneity Disease
1 0.13832 0.98060 0.11196 0.93102 Covid-19
2 0.26004 0.94321 0.11830 0.87663 Covid-19
... … … … … …
100 0.21120 0.96820 0.10053 0.89957 Covid-19
101 0.23424 0.96864 0.08905 0.89078 Pneumonia
102 0.17934 0.97358 0.12468 0.92873 Pneumonia
... … … … … …
200 0.22761 0.97249 0.09029 0.89180 Pneumonia

After performing feature extraction using the GLCM method, the next step is classification using
MKNN. The MKNN implementation uses several scenarios to get the highest accuracy. This research
uses the parameter values K = 3, K = 5, K = 7 and the distance using a comparison of Euclidean and
Manhattan Distance. The accuracy of the test results shown in Table 3.
Table 3. Accuracy Comparison.
Accuracy
Distance K
75%:25% 80%:20% 85%:15% 90%:10%
3 82 87.5 86.67 75
5 72 80 80 70
Manhattan
7 72 77.5 76.67 65
9 72 77.5 76.67 65
3 70 77.5 76.67 70
5 70 72.5 73.33 60
Euclidean
7 70 72.5 70 60
9 70 72.5 70 60

Based on Table 3 it was known that the highest accuracy of the test results is 87.5% with a value of K
= 3 using Manhattan Distance and 80%:20% data percentage. For the values of K = 7 and K = 9 there

5
ICOSTA 2021 IOP Publishing
Journal of Physics: Conference Series 2193 (2022) 012070 doi:10.1088/1742-6596/2193/1/012070

is no change in accuracy, so it can be concluded that the value of K that affects accuracy only occurs at
the values of K = 3 and K = 5. Then, the higher the K value, the lower the resulting accuracy. To see
more details about the test results for the highest accuracy, it shown in Table 4 using the Confusion
Matrix.
Table 4. Confusion Matrix for K = 3 using Manhattan Distance.
Prediction
Actual Covid-19 Pneumonia
Covid-19 15 3
Pneumonia 2 20

Based on Table 4 it was known that the highest error that occurred in the Covid-19 data, which was 3
(16.67%) out of 18 testing data. For Pneumonia data, error occurred which was 2 (0.09%) out of 22
testing data. This shows that Covid-19 and Pneumonia data still produces errors in the classification
process.

4. Conclusion
The amount of data used is 200 images, which consist of 100 Covid-19 X-Ray Images and 100
Pneumonia X-Ray. The image has been collected is improved quality using the CLAHE method. After
that, the feature extraction process is carried out using the GLCM method and becomes an input for the
MKNN classifier. The results showed that the highest accuracy is when the value of K = 3 using
Manhattan Distance and 80%:20% data percentage, which is 87.5%. For the values of K = 7 and K = 9
there is no change in accuracy, so it can be concluded that the value of K that affects accuracy only
occurs at the values of K = 3 and K = 5. Then, the higher the K value, the lower the resulting accuracy.

References
[1] N. Yudistira, A. W. Widodo, and B. Rahayudi, “Deteksi Covid-19 pada Citra Sinar-X Dada
Menggunakan Deep Learning yang Efisien,” J. Teknol. Inf. dan Ilmu Komput., vol. 7, no. 6,
pp. 1289–1296, 2020.
[2] S. M. Ilpaj and N. Nurwati, “Analisis Pengaruh Tingkat Kematian Akibat Covid-19 Terhadap
Kesehatan Mental Masyarakat Di Indonesia,” J. Pekerj. Sos., vol. 3, no. 1, pp. 16–28, 2020.
[3] A. P. Setiadi, Y. I. Wibowo, S. V. Halim, C. Brata, B. Presley, and E. Setiawan, “Tata Laksana
Terapi Pasien dengan COVID-19: Sebuah Kajian Naratif,” J. Farm. Klin. Indones., vol. 9, no.
1, pp. 70–94, 2020.
[4] C. Farmawati, M. Ula, and Q. Qomariyah, “Prevention of COVID-19 by Strengthening Body’s
Immune System through Self-Healing,” Populasi, vol. 28, no. 2, pp. 70–81, 2020.
[5] Y. D. Alfiyanti, D. E. Ratnawati, and S. Anam, “Klasifikasi Fungsi Senyawa Aktif Berdasarkan
Data Simplified Molecular Input Line Entry System (SMILES) Menggunakan Metode
Modified K-Nearest Neighbour,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 3, no. 4, pp.
3244–3251, 2019.
[6] F. N. Rozi and D. H. Sulistyawati, “Klasifikasi Berita Hoax Pilpres Menggunakan Metode
Modified k-Nearest Neighbor dan Pembobotan Menggunakan TF-IDF,” Konvergensi, vol. 15,
no. 1, pp. 1–10, 2019.
[7] M. Thohir, A. Z. Foeady, D. C. R. Novitasari, A. Z. Arifin, B. Y. Phiadelvira, and A. H. Asyhar,
“Classification of Colposcopy Data Using GLCM-SVM on Cervical Cancer,” in International
Conference on Artificial Intelligence in Information and Communication, 2020, pp. 373–378.
[8] T. Ozturk, M. Talo, E. A. Yildirim, U. B. Baloglu, O. Yildirim, and U. R. Acharya, “Automated
Detection of COVID-19 Cases using Deep Neural Networks with X-Ray Images,” Comput.
Biol. Med., vol. 121, 2020.
[9] M. M. Sebatubun, “Peningkatan Kualitas Citra X-Ray Paru - paru Menggunakan Contrast Limited
Adaptive Histogram Equalization dan Gaussian Filter,” in Seminar Riset Teknologi Informasi

6
ICOSTA 2021 IOP Publishing
Journal of Physics: Conference Series 2193 (2022) 012070 doi:10.1088/1742-6596/2193/1/012070

(SRITI), 2016, pp. 241–247.


[10] A. I. Zakaria, E. Ernawati, A. Vatresia, and W. K. Oktoeberza, “Perbandingan Metode High-
Frequency Emphasis (HFE) Dan Contrast Limited Adaptive Histogram Equalization
(CLAHE) Dalam Perbaikan Kualitas Citra Penginderaan Jauh (Remote Sensing),” J.
Pseudocode, vol. 6, no. 2, pp. 125–137, 2019.
[11] C. Ramya and S. Subha Rani, “A Novel Method for the Contrast Enhancement of Fog Degraded
Video Sequences,” Int. J. Comput. Appl., vol. 54, no. 13, pp. 1–5, 2012.
[12] M. Elisiana, U. D. Rosiani, and K. S. Batubulan, “Identifikasi ‘Acne Vulgaris’ berdasarkan Fitur
Warna dan Tekstur Menggunakan Klasifikasi JST Backpropagation,” J. Inform. Polinema,
vol. 7, no. 2, pp. 7–12, 2021.
[13] M. I. As Sauri, A. W. Widodo, and O. M. Luthfi, “Klasifikasi Genus Karang Keras (Scleractinia)
dengan Metode Gray Level Co-Occurrence Matrix,” J. Pengemb. Teknol. Inf. dan Ilmu
Komput., vol. 3, no. 6, pp. 5397–5405, 2019.
[14] F. Y. Manik, K. Saputra S, and D. S. Br Ginting, “Plant Classification Based on Extraction Feature
Gray Level Co-Occurrence Matrix Using k-Nearest Neighbour,” in Journal of Physics:
Conference Series, 2020, vol. 1566, no. 1, pp. 1–9.
[15] C. Menni et al., “Real-time tracking of self-reported symptoms to predict potential COVID-19,”
Nat. Med., vol. 26, no. 7, pp. 1037–1040, 2020.
[16] F. Wafiyah, N. Hidayat, and R. S. Perdana, “Implementasi Algoritma Modified K-Nearest
Neighbor ( MKNN ) untuk Klasifikasi Penyakit Demam,” J. Pengemb. Teknol. Inf. dan Ilmu
Komput., vol. 1, no. 10, pp. 1210–1219, 2017.
[17] S. I. Fernanda, D. E. Ratnawati, and P. P. Adikara, “Identifikasi Penyakit Diabetes Mellitus
Menggunakan Metode Modified K- Nearest Neighbor ( MKNN ),” J. Pengemb. Teknol. Inf.
dan Ilmu Komput., vol. 1, no. 6, pp. 507–513, 2017.
[18] W. Wahyono, I. N. P. Trisna, S. L. Sariwening, M. Fajar, and D. Wijayanto, “Comparison of
distance measurement on k-nearest neighbour in textual data classification,” J. Teknol. dan
Sist. Komput., vol. 8, no. 1, pp. 54–58, 2020.
[19] L. Farokhah, “Implementasi K-Nearest Neighbor Untuk Klasifikasi Bunga Dengan Ekstraksi
Fitur Warna RGB,” J. Teknol. Inf. dan Ilmu Komput., vol. 7, no. 6, pp. 1129–1136, 2020.
[20] R. Sari, “Analisis Sentimen Pada Review Objek Wisata Dunia Fantasi Menggunakan Algoritma
K-Nearest Neighbor (K-NN),” EVOLUSI J. Sains dan Manaj., vol. 8, no. 1, pp. 10–17, 2020.
[21] M. A. Maricar and D. Pramana, “Perbandingan Akurasi Naïve Bayes dan K-Nearest Neighbor
pada Klasifikasi untuk Meramalkan Status Pekerjaan Alumni ITB STIKOM Bali,” J. Sist. dan
Inform., vol. 14, no. 1, pp. 16–22, 2019.
[22] L. A. Utami, “Analisis Sentimen Opini Publik Berita Kebakaran Hutan Melalui Komparasi
Algoritma Support Vector Machine dan K-Nearest Neighbor Berbasis Particle Swarm
Optimization,” J. Pilar Nusa Mandiri, vol. 13, no. 1, pp. 103–112, 2017.
[23] E. Sutoyo and A. Almaarif, “Educational Data Mining untuk Prediksi Kelulusan Mahasiswa
Menggunakan Algoritme Naïve Bayes Classifier,” J. Rekayasa Sist. dan Teknol. Inf., vol. 4,
no. 1, pp. 95–101, 2020.
[24] F. Y. Manik and K. S. Saragih, “Klasifikasi Belimbing Menggunakan Naïve Bayes Berdasarkan
Fitur Warna RGB,” IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 11, no. 1, pp. 99–108,
2017.

Acknowledgments
The authors would like to give deep gratitude to Lembaga Penelitian dan Pengabdian kepada Masyarakat
Universitas Negeri Medan for funding this research through the PNBP 2021.

You might also like