You are on page 1of 6

Visualizing COVID-19 Vaccination Rate and

Vaccination Centre in Malaysia using DBSCAN


Clustering model
Sofiyah Insyirah Mohd Said Raihah Aminuddin Nor Afirdaus Zainal Abidin
Faculty of Computer and Mathematical College of Computing, Informatics College of Computing, Informatics
Sciences and Media and Media
2022 IEEE International Power and Renewable Energy Conference (IPRECON) | 978-1-6654-9175-4/22/$31.00 ©2022 IEEE | DOI: 10.1109/IPRECON55716.2022.10059495

MARA University of Technology MARA University of Technology MARA University of Technology


Malacca Branch (Jasin Campus) Malacca Branch (Jasin Campus) Malacca Branch (Jasin Campus)
Malacca, Malaysia Malacca, Malaysia Malacca, Malaysia
raihah1@uitm.edu.my *afirdaus@uitm.edu.my

Siti Diana Nabilah Mohd Nasir Asma Zubaida Mohamed Ibrahim


Kingston University City University
London, United Kingdom Selangor, Malaysia
s.mohdnasir@kingston.ac.uk asma.zubaida@city.edu.my

Abstract—In February 2021, the Malaysian government vaccine may reduce the risk of transmission of COVID-19
launched a vaccination campaign against coronavirus disease and will help to protect individuals from becoming sick [2].
2019 (COVID-19). However, there is a problem in identifying The vaccine act as booster to the immune system and act as
suitable location for vaccination centre should be allocated. At the
same time, there are population that living in the rural area and safeguard against later infection or disease. However, in order
has difficulty to travel to the nearest vaccination centres. There- to get vaccinated, people must book an appointment and go to
fore, based on the data of vaccination rate collected by Ministry the nearest vaccination centre. The government also provide
of Health, the proposed project aims to classify and visualise the another option for those who are living in rural are to book
data based on number of COVID-19 vaccination rate and centre an appointment at a mobile vaccination centre [3]. The mobile
in Malaysia for the adult and adolescent populations. This project
uses machine learning technique called Density-Based Spatial vaccination centre. is also for individual who are disabled and
Clustering of Applications with Noise (DBSCAN). The system cannot travel far away to get vaccinated [4]. The advantage of
is developed in Python language platform for back-end devel- the mobile vaccination centre that the location of the centre
opment, and PyCharm is utilised for front-end development in can be relocate based on demand of the people.
web-based platform. This project follows four phases in Waterfall
model: requirement analysis, design, implementation, and testing. A. Problem statements
The system is evaluated for functionality and usability based on
user satisfaction and the accuracy of the model. The results of the There are two problems that have been identified in sup-
testing shows that all the functionality of the system have been porting the effort of Malaysia government to provide vaccines
implemented successfully in the system. The system also rated to Malaysians people. Firstly, there are limited number of
good according to SUS Questionnaire in usability testing with vaccination centre and some people has lack of information
score of 88.5%. The model of machine learning also achieved
a good accuracy score which is greater than 0.3. In conclusion, about the location of the centre.
the data visualization web-based application helps the Malaysian Secondly, it will cost a lot of money on travel for indi-
government to identify location for additional vaccination centres viduals that need to travel long distance to the centre. Most
in strategic locations and it helps Malaysian people to locate people will cancel the appointment because the location of
nearby vaccination centres in their area. the vaccination immunisation centre is too far away and they
Index Terms—COVID-19, vaccination centre, clustering, DB-
SCAN, data visualization, web-based have transpiration issues such as no public transport and no
personal vehicle. Another serious issue is distinct time interval
between the first and second doses. As it is difficult to access
I. I NTRODUCTION
the nearest centre, the individual of first dose might have a
In Malaysia, on February 24, 2021 the Prime Minister longer waiting time to get the second dose. Therefore, this
of Malaysia, Muhyiddin Yassin, was the first recipient in will interfere the effect of vaccine on their body. This is due
Malaysia to receive a COVID-19 vaccination injection [1]. to the fact that each vaccination has a specific window of time
Since then the Malaysian government has been promoting and between the first and second doses that must be followed in
distributing COVID-19 vaccines to Malaysians people. The order to produce the best outcomes.

978-1-6654-9175-4/22/$31.00 ©2022 IEEE


Authorized licensed use limited to: Hochschule HS Dresden. Downloaded on June 11,2023 at 13:56:40 UTC from IEEE Xplore. Restrictions apply.
B. Aims B. Data Visualization
The project aims to create dashboard of data visualisation Data visualisation is the process of displaying existing
of vaccination rates to monitor areas where people in Malaysia data as an infographic, map, or table [7]. Some software
are not getting at least two doses of COVID-19 vaccine using programmes and machine learning model can be used to build
web-based application. The main significant of this project is patterns for data visualisation such as scattered plot, pie chart,
the system may help Malaysian government to decide where bar chart, box and whisker, line chart, and lastly geographic
to build and locate mobile vaccination centres based on local heat map.
demands. It can also help Malaysian people to find the nearest
vaccination centre to their residence. C. Related Work
There are two objectives in this project:
A study by [8] visualized the data of COVID-19 cases in the
1) To design and develop a web-based application of a world using DBSCAN model. Based on the data visualization,
mapping data visualization model of vaccination rate it may help the authority to learn and monitor the spread
using machine learning model of COVID-19 around the world. The system may help to
2) To test the functionality and usability of the system and reduce the misinformation about COVID-19 and decrease
the accuracy of the machine learning model. media social panic. The data collection is based on dataset
The paper is organized as follows: in the John Hopkins University from GitHub repository.
• Literature review and methodology are presented in Sec- In second study by [9], the data visualization is based on
tion 2 and Section 3, respectively. exploratory data analysis (EDA) of South Asian Association
• Limitation of the project and future work are presented of Regional Cooperation (SAARC). The authors observed
in Section 4. the number of confirmed death, recovered, and active cases
• Conclusion is conveyed in Section 5. of patients that infected COVID-19 in these countries. The
machine learning model, K-Means algorithm was applied to
II. L ITERATURE R EVIEW identify countries among SAARC based on the difference of
Data mining is a modern technique used to detect or identify intensity pandemic cases. The goal is to get a prediction of
an intriguing pattern in data [5]. the upcoming COVID-19 pandemic progression.
Another study in India by [10] aimed to visualise the
A. Clustering Technique number of healthcare facilities and residential area in city of
In data mining there is a technique called clustering, the India, Bhubaneshwar. The visualisation, which made use of
data may be grouped or the pattern of the dataset may be Folium maps, may assist in regulating how medical resources
ascertained. The clustering technique will reveal the structure are distributed during COVID-19 in areas with a lot of people.
of an unlabelled set of data. A cluster is a group of similar For clustering, the author employed HDBSCAN (Hierarchical
things or pieces of data that are actually unlike one another in DBSCAN), OPTICS (Ordering Points To Identify Cluster
the set of data. The clustering technique can be used to find Structure), and DBSCAN. The authors claim that the OPTICS
outliers or abnormalities in a dataset. The advantages of this algorithm’s clustering index results are better than those of the
technique is that the data scientist can learn about the dataset DBSCAN and HDBSCAN algorithms. The OPTICS has extra
and utilise it as a starting point for learning a new dataset. For parameters than the DBSCAN. In order to build the cluster,
this project, a clustering model DBSCAN was chosen as the the OPTICS produces reachable plots. On the other hand,
clustering model. HDBSCAN builds trees based on hierarchical clustering and
1) DBSCAN: The density-based clustering method known creating a multi-level tree, and the cluster is created.
as DBSCAN is widely used for determine the distance between
each point of data cluster [6]. There are two terms that is es- III. M ETHODOLOGY
sential in this model: epsilon and minimum points (minP ts). Waterfall model is used as the Software Development Life
The epsilon is the radius of the core point. The epsilon is Cycle (SDLC) technique [11]. The waterfall model has five
often stated in terms of feature space and Euclidean distance. steps and must be performed in order, as seen in Fig. 1:
The term minP ts refers to the smallest set of points in the
epsilon’s immediate vicinity in the sphere-shaped cluster. The A. Requirement gathering and analysis
model will pick a point and determine the core as a core point
or not. Then the core point must have been greater than the The goal of this phase is to collect information, data and
number of minP ts and epsilon value. The instances that are requirements for the proposed project. The project focuses on
linked to density will then be chosen. In the visiting process, using machine learning method on data visualisation and a
circumstances that are reachable from the core point but do density-based grouping method in specific area in Malaysia.
not have a minP ts value are referred to as border points. The
next unvisited point, known as a noise point or outlier, will B. System design and Implementation
create a cluster of related cases immediately after the visiting This phase focuses on system development based on DB-
operation is finished. SCAN model.

Authorized licensed use limited to: Hochschule HS Dresden. Downloaded on June 11,2023 at 13:56:40 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. Five phases of Waterfall model. Adapted from [11]

1) System Architecture Design: This study used machine Fig. 3. Use case of the proposed system with the features of the system: i)
learning clustering technique, DBSCAN algorithm to cluster Homepage Dashboard, ii) Adult Vaccination Rate Dashboard, iii) Adolescents
the data based on vaccinated rate and vaccination centre. Fig. Vaccination Rate Dashboard, iv) Vaccination Rate by Area Dashboard, and
v) Vaccination Centre Dashboard.
2 show the steps for forming the clusters using DBSCAN
algorithm.
3) System Implementation: In this implementation phase,
all the information in requirement phase and system design
are converted into source code with fully functional features.
Fig. 4 shows the four main steps of implementation phase: i)
Collecting data, ii) data pre-processing and iii) integration of
DBSCAN model in back-end system development. The back-
end implementation is using Python programming language.
Finally, the front-end or web-based development is using
HTML, CSS and JavaScript programming language.

Fig. 4. Four steps in implementation phase.

The collection of raw data is from GitHub


Fig. 2. Steps in DBSCAN clustering model. website: https://github.com/CITF-Malaysia/citf-
public/blob/main/CONTRIB.md. All the raw data is pre-
2) User Interface Design: Fig. 3 shows the use case of the processed in order to transform into a usable format. There
project. From it, can be seen the five main features in this are 33 columns and 77736 rows of data that has been
system with their functionality and the particular output will collected. The data was also has been verified for missing
be shown for end user in each feature. The user can see the values. After the data pre-processing, the clustering model
cluster analysis of COVID-19 vaccination rate in Malaysia in DBSCAN were used to categorize the data by state in
website platform using the data visualization techniques. Malaysia. In order to achieve best results, the implementation

Authorized licensed use limited to: Hochschule HS Dresden. Downloaded on June 11,2023 at 13:56:40 UTC from IEEE Xplore. Restrictions apply.
of machine learning model used different parameter. The
technique is called a trial and error, optimal value of eps
and M inP ts is selected to obtain the greatest number of
clusters and the smallest number of outliers. There are 6
models created in this study. All the models consists three
populations, i) overall total of adult and adolescents, adult
and adolescent, and the scope of data is range from 27
February 2021 to 30 June 2022. There are also two categories
for vaccination status: i) complete and ii) booster. Fig. 5
shows snippet codes of DBSCAN model to identify number
of cluster and outliers.
Fig. 6. Interface of geographical heat map of COVID-19 vaccination rate in
Malaysia.

Fig. 5. Snippet of DBSCAN model used in the proposed system.


Fig. 7. Interface of geographical heat map with green dots of COVID-19
vaccination centres in Malaysia.

Plotly library was used to cluster the data into data visual-
ization and there are nine visualization created in the proposed
system. However, the main visualization in the dashboard is
geographical heat map of COVID-19 vaccination in Malaysia,
shown in Fig. 6. The colour of heat is based on the percentage
of vaccination rate in each state in Malaysia. For example,
a green dot will be pinned in the map to visualize the
location of the vaccination centre, as seen in Fig. 7. Other
data visualization techniques are i) bar chart of vaccination
rate by state, ii) line chart of vaccination rate in Malaysia
by month, iii) pie chart of vaccination rate by nationality,
iv) scatter plot of vaccination rate of high-risk population,
v) bar chart of vaccine dose used, vi) grouped bar chart of
Fig. 8. Interface of bar chart of COVID-19 vaccination rate in Malaysia.
vaccination rate by area in Malaysia, vii) horizontal grouped
bar chart of vaccination rate in each area in Malaysia and viii)
bar chart of vaccination centre in Malaysia. The examples of
data visualization can be seen in Fig. 8, Fig. 9, Fig. 10 and
Fig. 11.
Fig. 12 shows the homepage of the proposed system and
other four dashboard: i) Adult vaccination rate, ii) adolescents
vaccination rate, iii) vaccination rate by area and iv) List
of vaccination centre. The dashboard can be accessed using
navigation menu on the left of the page. In vaccination rate
by area dashboard, the data shows the vaccination rate of six
areas in Malaysia: South, North, East, West, East Coast and
Wilayah Persekutuan area. In the last dashboard, the system
will allow user to find the location of vaccination centre by Fig. 9. Interface of line chart of COVID-19 vaccination rate in Malaysia.
state and city.

Authorized licensed use limited to: Hochschule HS Dresden. Downloaded on June 11,2023 at 13:56:40 UTC from IEEE Xplore. Restrictions apply.
TABLE I
R ESULT OF FUNCTIONALITY TEST OF TEST CASE

Verification
Test Case
(Success/Fail)
Homepage Dashboard Success
Adult Vaccination Rate Dashboard Success
Adolescents Vaccination Rate Dashboard Success
Vaccination Rate by State Dashboard Success
Vaccination Centre Dashboard Success
Fig. 10. Interface of pie chart of COVID-19 vaccination rate in Malaysia.

TABLE II
SUS ITEMS AND RESULTS FOR EACH ITEM

Questionnaire Results (%)


I think I would love to use
1 94
this website continually.
I found the website
2 34
unnecessarily complex.
3 I found this website is informative. 98
I think that I would need the support of a
4 technical person to be able to use this 32
website’s function.
I think most functions in the website
5 98
are properly integrated.
I found there was excessive amount
6 34
of inconsistency in this website.
I would imagine that most people would
7 92
learn to use this website very instantaneously.
8 I thought the website really awkward to use. 28
Fig. 11. Interface of scatter chart of COVID-19 vaccination rate in Malaysia. 9 I felt very confident to use this website. 92
I needed to catch up a lot of stuff
10 38
before I could get along with this website.

C. Results and testing


There are three testing have been conducted in this project:
(i) functionality testing, (ii) usability testing and (iii) clustering available to use is System Usability Scale (SUS) [13]. The
accuracy testing. SUS consists of then items, the items in SUS questionnaire
1) Functionality Testing: In this project, there are five can be seen in Table II. In this study, 10 respondents were
test case were tested, as shown in Table I. The objective of invited to test the system and they were asked to rate their
functionality test it to validate the if the feature of the system feeling towards the system. The final results can be calculated
can be used or not [12]. The validation is rated as ”Success” by totalling all scores for odd items and even items [14].
or Failure”. Based on the testing, the results show that all the However, total score for odd items will be deducted 5 score,
test case passed the testing and meet the expected outcome. while total score of even items will be deducted 25. Based on
2) Usability Testing: The goal of usability testing is to the new total of odd and even items, both score is added and
measure user’s satisfaction after using the proposed system. the multiply by 2.5. Then the score of SUS will be in range
One of suitable questionnaire that have been validated and of 1 to 100. Higher score shows best user satisfaction after
using the system. The final score for the proposed system is
88.5% which considered the system achieved high score in the
usability testing.
3) Cluster Accuracy Testing: Cluster accuracy test using
metric assessment of silhouette coefficient is to calculate the
quality of the clustering model of the machine learning [15],
[16]. The range of the accuracy result is between -1 (low
quality model) and 1 (high quality model). In Table III shows
the accuracy result of each features (model) in the proposed
system implemented using DBSCAN algorithm. As can be
seen, the accuracy of each model is greater than 0.3, therefore,
it can be concluded that the quality of the DBSCAN technique
Fig. 12. Home page or dashboard of the proposed system. in clustering the collected data is good.

Authorized licensed use limited to: Hochschule HS Dresden. Downloaded on June 11,2023 at 13:56:40 UTC from IEEE Xplore. Restrictions apply.
TABLE III the rate of COVID-19 vaccination for adult or adolescents
O UTPUT OF C LUSTERING ACCURACY T ESTING population in six different areas in Malaysia which is South,
North, East, West, East Coast and Wilayah Persekutuan.
Model Accuracy
Full vaccination for all population 0.421 ACKNOWLEDGEMENT
Booster vaccination for all population 0.320
Full vaccination for adult population 0.418
The authors would like to thank Universiti Teknologi
Booster vaccination for adult population 0.316 MARA Malacca Branch, Center of Vision and Algorithms
Full vaccination for adolescent population 0.552 Analytics (C-VAA), and Information Technology for Organi-
Booster vaccination for adolescent population 0.414 zations (ITFO) Research Groups for the support throughout
this research.
R EFERENCES
D. Documentation
[1] J. W. Ng, E. T. J. Chong, Y. A. Tan, H. G. Lee, L. L. Chan, Q. Z. Lee,
In the final phase, the information about the system de- Y. T. Saw, Y. Wong, A. A. B. Zakaria, Z. B. Amin, et al., Prevalence of
velopment is documented. The aims of the documentation is coronavirus disease 2019 (covid-19) in different clinical stages before
to provide information that all requirement in the project has the national covid-19 vaccination programme in malaysia: A system-
atic review and meta-analysis, International journal of environmental
been met by the researcher. research and public health 19 (4) (2022) 2216.
[2] M. H. Elnaem, N. H. Mohd Taufek, N. S. Ab Rahman, N. I. Mohd Nazar,
IV. L IMITATION OF THE PROJECT AND FUTURE WORK C. S. Zin, W. Nuffer, C. J. Turner, Covid-19 vaccination attitudes,
perceptions, and side effect experiences in malaysia: Do age, gender,
There are some limitations in the proposed project and and vaccine type matter?, Vaccines 9 (10) (2021) 1156.
system: [3] N. N. Chan, K. W. Ong, C. S. Siau, K. W. Lee, S. C. Peh, S. Yacob,
1) During data collection, the raw data need to be collected Y. C. Chia, V. K. Seow, P. B. Ooi, The lived experiences of a covid-19
immunization programme: vaccine hesitancy and vaccine refusal, BMC
from different sources. Therefore, results in different Public Health 22 (1) (2022) 1–13.
format of column name and increase the time complexity [4] H. S. Teh, Y. L. Woon, C. T. Leong, N. Y. L. Hing, T. Y. S. Mien,
when processing all the data. L. S. Roope, P. M. Clarke, L.-L. Lim, J. Buckell, Malaysian public
preferences and decision making for covid-19 vaccination: a discrete
2) The data used in the proposed system is not up-to-date choice experiment, The Lancet Regional Health-Western Pacific 27
and the system can only process data up to June 2022. (2022) 100534.
3) Furthermore, on the heat map, the proposed system [5] L. Muhammad, M. Islam, S. S. Usman, S. I. Ayon, et al., Predictive
data mining models for novel coronavirus (covid-19) infected patients’
has limited information about the vaccination centres in recovery, SN Computer Science 1 (4) (2020) 1–7.
Malaysia, such as address. [6] H. El Bahi, A. Zatni, Document text detection in video frames acquired
by a smartphone based on line segment detector and dbscan clustering,
However, the current system can be improved in the future Journal of Engineering Science and Technology 13 (2) (2018) 540–557.
work, such as: [7] G. Chawla, S. Bamal, R. Khatana, Big data analytics for data visu-
1) The proposed system can process and analyse raw data alization: Review of techniques’, International Journal of Computer
Applications 182 (21) (2018) 37–40.
based on real time data. Therefore, all the information [8] H. A. Wafa, R. Aminuddin, S. Ibrahim, N. N. A. Mangshor, N. I.
can be updated daily automatically. F. A. Wahab, A data visualization framework during pandemic using
2) The address of the vaccination centre can be added the density-based spatial clustering with noise (dbscan) machine learning
model, in: 2021 IEEE 11th International Conference on System Engi-
into the visualization map. Hence, increase the user neering and Technology (ICSET), IEEE, 2021, pp. 1–6.
interactivity with the system. [9] K. C. Paul, M. A. Hoque, S. M. Dhiman, J. K. Sen, Data analytics on
3) The proposed system is also aim to expand the scope of the covid-19 outbreak in south asia using machine learning methods,
International Journal 10 (4) (2021).
the data set, for example, the data visualization can be [10] R. Kuila, P. Sengupta, M. Rout, R. K. Barik, Density based geospatial
categorized into children aged between 5 and 11 years clustering: Methods, applications and future directions, in: Proceedings
old. of the 2022 Fourteenth International Conference on Contemporary
Computing, 2022, pp. 404–409.
[11] N. Upadhyay, M. Singh, M. Shrivastava, A. Mishra, Analysis and
V. C ONCLUSION comparison with modern software development approaches, NVEO-
The proposed data visualization system is developed us- NATURAL VOLATILES & ESSENTIAL OILS Journal— NVEO
(2021) 7553–7559.
ing Python programming language with PyCharm software. [12] M. Taufik, H. Hudiono, A. E. Rakhmania, R. H. Y. Perdana, A. S. Sari,
A clustering analysis using big data visualization technique An internet of things based intercity bus management system for smart
named Density-Based Spatial Clustering of Applications with city, International Journal of Computing and Digital Systems 10 (2021).
[13] M. R. Drew, B. Falcone, W. L. Baccus, What does the system usability
Noise (DBSCAN) model is used to cluster the data of inhab- scale (sus) measure?, in: International Conference of Design, User
itants in each country and the proportion of full or booster Experience, and Usability, Springer, 2018, pp. 356–366.
vaccination rates in Malaysia. The data was visualized in a [14] J. Brooke, Sus: a retrospective, Journal of usability studies 8 (2) (2013)
29–40.
geographical heat map, it assists user in locating the near- [15] L. Hu, H. Liu, J. Zhang, A. Liu, Kr-dbscan: A density-based clustering
est vaccination center in their neighbourhood. The proposed algorithm based on reverse nearest neighbor and influence space, Expert
system aims to help Malaysia Government to enhance the Systems with Applications 186 (2021) 115763.
[16] E. M. S. Rochman, A. Khozaimi, I. O. Suzanti, R. Jannah, B. K. Kho-
efforts to provide mobile vaccination centre in critical regions timah, A. Rachmad, et al., A combination of algorithm agglomerative
to Malaysian people. The data visualization also can help hierarchical cluster (ahc) and k-means for clustering tourism in madura-
Malaysian people and Malaysia government to keep track indonesia, J. Math. Comput. Sci. 12 (2022) Article–ID.

Authorized licensed use limited to: Hochschule HS Dresden. Downloaded on June 11,2023 at 13:56:40 UTC from IEEE Xplore. Restrictions apply.

You might also like