Professional Documents
Culture Documents
Kana Saputra S1, Insan Taufik2, Mhd Hidayat3, Dinda Farahdilla Dharma4
1,2,3,4
Study Program of Computer Science, Faculty of Mathematics and Natural Sciences,
Universitas Negeri Medan, North Sumatera, Indonesia
kanasaputras@unimed.ac.id; insantaufik@unimed.ac.id;
mhdhidayat@mhs.unimed.ac.id; dindafarahdilla@mhs.unimed.ac.id
Abstract. Covid-19 is a virus that was first discovered in China, which has the impact of mild
and severe respiratory infections such as pneumonia. Pneumonia is inflammation and
consolidation of lung tissue due to infectious agents. Generally pneumonia has a high mortality
rate, as do Covid-19 patients. For now, it is very difficult to distinguish between Pneumonia and
Covid-19, due to the high similarity of X-Ray image results. The high similarity has an impact
on the difficulty of difference between Pneumonia and Covid-19 patients. This research aims to
be able to different Pneumonia and Covid-19 patients based on texture analysis of the Gray Level
Co-Occurrence Matrix using Modified k-Nearest Neighbour as a classifier. The calculations used
in the Gray Level Co-Occurrence Matrix method are Contrast, Correlation, Energy, and
Homogeneity which will be input for the Modified k-Nearest Neighbour classifier. The results
showed that the highest accuracy is when the value of K = 3 using Manhattan Distance and
80%:20% data percentage, which is 87.5%. For the values of K = 7 and K = 9 there is no change
in accuracy, so it can be concluded that the value of K that affects accuracy only occurs at the
values of K = 3 and K = 5. Then, the higher the K value, the lower the resulting accuracy.
1. Introduction
The world is currently experiencing one of the biggest health disasters in human history, we called
Corona Virus. Coronavirus Disease 2019 (Covid-19) was first discovered in Hubei Province, China
through reports of the type of Pneumonia whose cause is not yet known [1]. Based on data from the
World Health Organization (WHO) on March 11, 2021, confirmed cases of Covid-19 reached
117,799,584 cases and 2,615,018 deaths spread across 223 countries. For Indonesia, based on data from
the Covid-19 Task Force on March 11, 2021, it was stated that the confirmed positive cases of Covid-
19 reached 1,403,722 cases and 38,049 deaths. The impact of Covid-19 not only causes mild respiratory
infections but can cause severe respiratory infections such as pneumonia and even death [2].
The disease caused by Covid-19 has very similar symptoms to pneumonia. However, a disease that
causes inflammation of the lungs due to pathogenic infection is called Pneumonia [3]. Pneumonia caused
by Covid-19 is slightly different from pneumonia that generally occurs. This inflammation can cause
the air sacs in the respiratory tract of the lungs to become infected and fill with fluid. However,
pneumonia can heal by itself if the patient's immune system is good. For Covid-19, this disorder
generally attacks the upper respiratory tract which can eventually spread to the lungs. Covid-19 can
infect the upper respiratory tract and cause blockages in the respiratory organs [4]. This study aims to
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
ICOSTA 2021 IOP Publishing
Journal of Physics: Conference Series 2193 (2022) 012070 doi:10.1088/1742-6596/2193/1/012070
be able to distinguish Pneumonia and Covid-19 patients based on texture analysis of the Gray-Level Co-
Occurrence Matrix (GLCM) using Modified k-Nearest Neighbour (MKNN) as a classifier.
MKNN method is one of the classifier methods used for classification. Research on the application
of MKNN has been carried out for the classification of the function of active compounds based on the
Simplified Molecular Input Line Entry System (SMILES) code with the test results getting an accuracy
value of 73% with a value of K = 3 [5]. In addition, MKNN method has also been implemented for the
classification of hoax news in the presidential election by calculating the Precision value of 93.75%,
Recall value of 90.90%, and accuracy value of 92.30% [6]. For GLCM method has been applied for
feature extraction of cervical cancer colposcopy data and uses a support vector machine as a classifier
with an accuracy of 90% [7]. Some of these studies show that GLCM as feature extraction and MKNN
is quite good for use as a classifier.
Based on that explanation, this research will apply MKNN to identify lungs of Covid-19 patients and
lungs of pneumonia patients. Feature extraction using texture analysis approach with GLCM method.
The feature uses the calculation of Contrast, Correlation, Energy, and Homogeneity which will be the
input for the MKNN classifier.
2. Research Method
The stages in this research as shown in Figure 1.
Data Collection
Data Preprocessing
Data
Enhancement
Feature Extraction
(Gray Level Co-
occurrence Matrix)
Data
Data Splitting
Modified KNN
Model
2
ICOSTA 2021 IOP Publishing
Journal of Physics: Conference Series 2193 (2022) 012070 doi:10.1088/1742-6596/2193/1/012070
Emphasis (HFE) method in improving image quality [10]. The stages of the CLAHE method are as
follows [9], [11]:
In the original histogram, pixels will be clipped if the number of pixels is greater
than 𝑁𝑐 . The number of pixels is evenly distributed into each degree of gray (𝑁𝑑 )
which is defined by the number of pixels clipped 𝑁𝑇𝐶 formulated in Equation (3).
𝑁𝑇𝐶
𝑁𝑑 = 𝑁 (3)
𝑔𝑟𝑎𝑦𝑙𝑒𝑣𝑒𝑙
𝐻𝑆𝐼 (𝑖) is the number of pixels in each sub-image gray level and "i" is the number
of gray levels. By using Equation (3), histogram contrast limited sub-images can be
calculated using Equation (4).
𝑖𝑓 𝐻𝑆𝐼 > 𝑁𝐶−𝐿 , 𝐻𝑁𝑆𝐼 (𝑖) = 𝑁𝐶−𝐿
𝐸𝑙𝑠𝑒 𝑖𝑓 𝐻𝑆𝐼 (𝑖) + 𝑁𝑑 ≥ 𝑁𝐶−𝐿 , 𝐻𝑁𝑆𝐼 (𝑖) = 𝑁𝐶−𝐿 (4)
𝐸𝑙𝑠𝑒 𝐻𝑁𝑆𝐼 (𝑖) = 𝐻𝑆𝐼 (𝑖) + 𝑁𝑑
At the end of the distribution in Equation (4), the remaining number of cut pixels is
expressed as 𝑁𝑅𝑃 , the pixel distribution stage is formulated in Equation (5).
𝑁𝑔𝑟𝑎𝑦𝑙𝑒𝑣𝑒𝑙
𝑆= 𝑁𝑅𝑃
(5)
This method scans all pixels from the minimum to the maximum gray-level value.
If the gray-level pixel frequency is less than 𝑁𝐶−𝐿 , this method distributes a one-
pixel gray-level value. If the search ends before the distribution of all pixels, it will
be calculated again based on Equation (5) and start a new search until all pixels are
distributed. Thus a new histogram will be obtained.
Step 4 : Limited contrast histogram of each sub-image processed with HE. Next, the pixels
of the sub-images are mapped using linear interpolation.
3
ICOSTA 2021 IOP Publishing
Journal of Physics: Conference Series 2193 (2022) 012070 doi:10.1088/1742-6596/2193/1/012070
1. Contrast is to measure the value of the difference in intensity in the image. Contrast can be
calculated using Equation (6).
𝐶𝑜𝑛𝑡𝑟𝑎𝑠𝑡 = ∑𝑛=1 𝑛2 {∑𝑖 ∑𝑗 𝑃(𝑖, 𝑗)}
(6)
2. Correlation is to measure the linear value of the gray degree of neighbors pixels in the gray
image. Correlation can be calculated using Equation (7).
∑𝑖 ∑𝑗(𝑖,𝑗)(𝑃(𝑖,𝑗))−(𝜇𝑖 ′𝜇𝑗 ′)
𝐶𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 =
𝜎𝑖 ′𝜎𝑗 ′
(7)
3. Energy (Inverse Different Moment) is to measure the concentration value of the pair in the co-
occurrence matrix. Energy can be calculated using Equation (8).
𝑃(𝑖,𝑗)
𝐼𝐷𝑀 = ∑𝑖 ∑𝑗 1+(𝑖−𝑗)2
(8)
4. Homogeneity (Angular Second Moment) is used to measure the similarity value of intensity
variations in the image. Homogeneity can be calculated using Equation (9).
𝐴𝑆𝑀 = ∑𝑖 ∑𝑗{𝑃(𝑖, 𝑗)}2
(9)
4
ICOSTA 2021 IOP Publishing
Journal of Physics: Conference Series 2193 (2022) 012070 doi:10.1088/1742-6596/2193/1/012070
The amount of data used is 200 images, which consists of 100 Covid-19 X-Ray Images and 100
Pneumonia X-Ray Images. After the image enhancement using the CLAHE method, the next step is to
perform feature extraction using the GLCM method. The results of feature extraction using the GLCM
method shown in Table 2.
Table 2. Feature Extraction Results Using the GLCM Method.
No Contrast Correlation Energy Homogeneity Disease
1 0.13832 0.98060 0.11196 0.93102 Covid-19
2 0.26004 0.94321 0.11830 0.87663 Covid-19
... … … … … …
100 0.21120 0.96820 0.10053 0.89957 Covid-19
101 0.23424 0.96864 0.08905 0.89078 Pneumonia
102 0.17934 0.97358 0.12468 0.92873 Pneumonia
... … … … … …
200 0.22761 0.97249 0.09029 0.89180 Pneumonia
After performing feature extraction using the GLCM method, the next step is classification using
MKNN. The MKNN implementation uses several scenarios to get the highest accuracy. This research
uses the parameter values K = 3, K = 5, K = 7 and the distance using a comparison of Euclidean and
Manhattan Distance. The accuracy of the test results shown in Table 3.
Table 3. Accuracy Comparison.
Accuracy
Distance K
75%:25% 80%:20% 85%:15% 90%:10%
3 82 87.5 86.67 75
5 72 80 80 70
Manhattan
7 72 77.5 76.67 65
9 72 77.5 76.67 65
3 70 77.5 76.67 70
5 70 72.5 73.33 60
Euclidean
7 70 72.5 70 60
9 70 72.5 70 60
Based on Table 3 it was known that the highest accuracy of the test results is 87.5% with a value of K
= 3 using Manhattan Distance and 80%:20% data percentage. For the values of K = 7 and K = 9 there
5
ICOSTA 2021 IOP Publishing
Journal of Physics: Conference Series 2193 (2022) 012070 doi:10.1088/1742-6596/2193/1/012070
is no change in accuracy, so it can be concluded that the value of K that affects accuracy only occurs at
the values of K = 3 and K = 5. Then, the higher the K value, the lower the resulting accuracy. To see
more details about the test results for the highest accuracy, it shown in Table 4 using the Confusion
Matrix.
Table 4. Confusion Matrix for K = 3 using Manhattan Distance.
Prediction
Actual Covid-19 Pneumonia
Covid-19 15 3
Pneumonia 2 20
Based on Table 4 it was known that the highest error that occurred in the Covid-19 data, which was 3
(16.67%) out of 18 testing data. For Pneumonia data, error occurred which was 2 (0.09%) out of 22
testing data. This shows that Covid-19 and Pneumonia data still produces errors in the classification
process.
4. Conclusion
The amount of data used is 200 images, which consist of 100 Covid-19 X-Ray Images and 100
Pneumonia X-Ray. The image has been collected is improved quality using the CLAHE method. After
that, the feature extraction process is carried out using the GLCM method and becomes an input for the
MKNN classifier. The results showed that the highest accuracy is when the value of K = 3 using
Manhattan Distance and 80%:20% data percentage, which is 87.5%. For the values of K = 7 and K = 9
there is no change in accuracy, so it can be concluded that the value of K that affects accuracy only
occurs at the values of K = 3 and K = 5. Then, the higher the K value, the lower the resulting accuracy.
References
[1] N. Yudistira, A. W. Widodo, and B. Rahayudi, “Deteksi Covid-19 pada Citra Sinar-X Dada
Menggunakan Deep Learning yang Efisien,” J. Teknol. Inf. dan Ilmu Komput., vol. 7, no. 6,
pp. 1289–1296, 2020.
[2] S. M. Ilpaj and N. Nurwati, “Analisis Pengaruh Tingkat Kematian Akibat Covid-19 Terhadap
Kesehatan Mental Masyarakat Di Indonesia,” J. Pekerj. Sos., vol. 3, no. 1, pp. 16–28, 2020.
[3] A. P. Setiadi, Y. I. Wibowo, S. V. Halim, C. Brata, B. Presley, and E. Setiawan, “Tata Laksana
Terapi Pasien dengan COVID-19: Sebuah Kajian Naratif,” J. Farm. Klin. Indones., vol. 9, no.
1, pp. 70–94, 2020.
[4] C. Farmawati, M. Ula, and Q. Qomariyah, “Prevention of COVID-19 by Strengthening Body’s
Immune System through Self-Healing,” Populasi, vol. 28, no. 2, pp. 70–81, 2020.
[5] Y. D. Alfiyanti, D. E. Ratnawati, and S. Anam, “Klasifikasi Fungsi Senyawa Aktif Berdasarkan
Data Simplified Molecular Input Line Entry System (SMILES) Menggunakan Metode
Modified K-Nearest Neighbour,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 3, no. 4, pp.
3244–3251, 2019.
[6] F. N. Rozi and D. H. Sulistyawati, “Klasifikasi Berita Hoax Pilpres Menggunakan Metode
Modified k-Nearest Neighbor dan Pembobotan Menggunakan TF-IDF,” Konvergensi, vol. 15,
no. 1, pp. 1–10, 2019.
[7] M. Thohir, A. Z. Foeady, D. C. R. Novitasari, A. Z. Arifin, B. Y. Phiadelvira, and A. H. Asyhar,
“Classification of Colposcopy Data Using GLCM-SVM on Cervical Cancer,” in International
Conference on Artificial Intelligence in Information and Communication, 2020, pp. 373–378.
[8] T. Ozturk, M. Talo, E. A. Yildirim, U. B. Baloglu, O. Yildirim, and U. R. Acharya, “Automated
Detection of COVID-19 Cases using Deep Neural Networks with X-Ray Images,” Comput.
Biol. Med., vol. 121, 2020.
[9] M. M. Sebatubun, “Peningkatan Kualitas Citra X-Ray Paru - paru Menggunakan Contrast Limited
Adaptive Histogram Equalization dan Gaussian Filter,” in Seminar Riset Teknologi Informasi
6
ICOSTA 2021 IOP Publishing
Journal of Physics: Conference Series 2193 (2022) 012070 doi:10.1088/1742-6596/2193/1/012070
Acknowledgments
The authors would like to give deep gratitude to Lembaga Penelitian dan Pengabdian kepada Masyarakat
Universitas Negeri Medan for funding this research through the PNBP 2021.