Professional Documents
Culture Documents
Abstrak, This research talked about classification of rabies cases By using the CRISP-DM
methodology through the process of business understanding, data understanding, data
preparation, modeling, evaluation and deployment. The data that has the same characteristics
are clustered in a single cluster then the data with different characteristics are clustered in
another cluster. The algorithm used in clustering technique is K-Means algorithm, and then
implemented into RapidMiner software In this case, the cluster is divided into three clusters. The
first cluster is a result of an expenditure that has an area with high levels of rabies case, the
second cluster is a result of an expenditure that has an area with medium levels of rabies case,
and the third cluster is a result of an expenditure that has an area with low levels of rabies case.
Based on the result of research from 16 sub districts there are 7 sub districts can be classified
as an area with high levels of rabies case (C1) (Kertapati,SU II, Plaju, Kemuning, Kalidoni,
Sako and Alang-Alang Lebar), 4 sub districts of them can be classified as an area with medium
levels of rabies case (C2) (SU I, IB I, IT II, and Sukarami), and then the other 5 sub district are
classified as an area with low levels of rabies case (C3) (IB II, Gandus, Bukit Kecil, IT I, and
Sematang Borang).
Keywords: CRISP-DM , K-Means, Cluster, RapidMiner
1. Intoduction
Rabies is a zoonosis disease that can attack all warm-blooded animals and humans. Rabies, otherwise
known as mad dog disease, is an infection caused by the rabies virus. According to WHO,Domestic
dogs are the most common reservoir of virus rabies, With over 95% of human deaths caused by dogs
that have the rabies virus. The virus can be transmitted through the saliva of an infected animal and then
entered the human body with the virus from an infected animal. There are some indicators used in efforts
control rabies's disease is an animal bite case (GHPR), the cases were vaccinated with an anti rabies
vaccine (VAR) and A positive case of rabies and death based on the lyssa test [1].
Rabies in Indonesia is still a serious pet disease and include in infectious animal disease strategic
priorities because it affects social economic and public health. The incidents of rabies cases in both
animals and humans almost always end in death (case fatality rate 100%) and so asa result of this disease,
it causes fear and anxiety and anxiety for society. In addition, which includes a high risk of contracting
this virus is that Childrens who live in an area prone to an animal bite infection , communities below
and remote areas where public awareness of environmental health access is low. It will cause spatial
variability between regions that will have specificities according to the region [4]. The development of
this case is clustering based on the sub district clustering method. This method is part of data-mining
with ultimate goal of finding previously unknown patterns and trends in databases and using that
information to build predictive models [6].
The clustering method is commonly used to map or classify an object by k-means clustering method.
This technique has been used many times by early researchers to clustering diseases in Indonesia like
tuberculosis (TB) (4) and dengua fever disease [7]. There is event a research that classifies the natural
disaster province into several clusters so it became an early warning system for the public as to the high-
risk point of location of a natural disaster And a disaster safe location [3] and so also the research of
clustering rice corps that uses RapidMiner software assistance [5].
This research talked about classification of rabies cases by using the CRISP-DM methodology
through the process of business understanding, data understanding, data preparation, modeling,
evaluation and deployment. The main purpose of this research is that epidemic or results from the cluster
can be input for the Government, especially Palembang City in an effort to disseminate information
about the dangers of rabies cases that often end in death and control of the cases.
2.1 K-Means
K-Means (KM) clustering is a widely used partitioning method. This method aims to create the
K cluster of mutually exclusive N sample data that is marked with a D parameter. Each K cluster is
defined with one central point (mass Center) determined by a specific combination of parameters
contained in each sample data [6].
Clustering with the K-Means method is as follows [7]:
1. Select the number of clusters K.
2. Initializing this cluster center K can be done in various ways. But the most often done is by
means of random. Cluster centers are given an initial value with a random number..
3. Allocate all data/objects to the nearest cluster. The proximity of two objects is determined by
the distance of both objects. To calculate the distance of all the data to each cluster center point
can use the Euclidean spacing theory that is formulated as follows:
𝐷(𝑖𝑗) = √(𝑥1𝑖 − 𝑥2𝑗 )2 + (𝑥2𝑖 − 𝑥3𝑗 )2 + ⋯ + (𝑥𝑘𝑖 − 𝑥𝑘𝑗 )2
(1)
Dimana :
D(I,j) = Distance of data to I to the center of Cluster J
Xki = Data to attribute data to K
Xkj = Central point to J on attribute to K
4. Recalculate the cluster center with the current cluster membership. The cluster center is the
average of all data/objects in a given cluster. If desired can also use the median of the cluster.
So the average (mean) is not the only size that can be worn.
1
𝑅𝑘 = (𝑥 + 𝑥2𝑘 + ⋯ + 𝑥𝑛𝑘 )2
𝑁𝑘 1𝑘
(2)
Dimana :
𝑅𝑘 = New average.
Nk = Number of training pattern on cluster (k).
Xnk = Pattern to (n) the cluster part (k).
5. Reasapply each object using a new cluster center. If the cluster center does not change anymore
then the clustering process is complete. Or, go back to step number 3.
3. Research Methodology
This study used the data mining methodology of Cross Standard Industry Processing for Data
Mining (CRISP-DM) as the steps to conduct research. CRISP-DM has standard data mining as a
problem solving that is common for business and research. CRISP-DM methodology consists of six
phases as steps to conduct research in Business Understanding, Data Understanding, Data Preparation,
Modeling. Evaluation, and Deployment [2].
Figure 1. Crisp-DM Stage
1. Business Understanding.
In this phase, the objectives of the business understanding of the data processing of
rabies-outstretched animal bites, rabies indicators, the way of its transmission, and the
way of its appearance.
2. Data Understanding
This phase is carried out data collection victims of rabies animal bites in Palembang.
The Data provided is an Excel document. The selection of the attributes used is the
population of each warning, the number of cases of GHPR and the number of cases in
the Vaksinization
3. Data Preparation.
Data preparation includes all activities to build the dataset that will be applied into the
modeling tool, from the raw data in the form of datasets and will then process the data
mining.
4. Modeling.
This stage is the implementation phase of the data that has been imported, which will
be implemented clustering by using the algorithm K-means on RapidMiner.
5. Evaluation.
This stage assesses the extent of data mining modeling to fulfill the purpose of data
mining that has been determined at the business understanding stage.
6. Deployment.
In this stage the deployment is done with report generation and journal articles.
Furthermore, the data that has been accumulated will be transferred into region by publishing the
algorithm K-means using RapidMiner software based on the specified cluster.
In Picture 2. Shows the configuration of the design process of a C-means in RapidMiner with a
value of K = 3. The first process illustrates that this operator processes ExampleSet of the Excel file
we've specified. All required parameters are stored in this model object. Then the data goes into the
clustering model, the model that performs the clustering using the Kmeans algorithm. After
performing the model performance is used for the performance evaluation of a centroid-based
clustering method. This Model is for classifying attribute values with defined clusters. Based on the
design configuration in the image above the RapidMiner application will configure the GHPR case
area based on the cluster that has been created
Figure. 3. Above describes grouping areas of animal bite cases with seven sub-districts occupying
areas with high clusters (C0), four sub-districts in the moderate cluster area (C1), and five sub-districts
occupying a low cluster area (C2).
Figure 4. The final centeroid at the last iteration
From Picture 4. Above, the cluster at the central point distance value will continue to fluctuate
until the number of iteration values performed remains.
This is a sheet to display the results of the repressor of the database that has been processed in its
entirety complete with the cluster. Where is cluster 0 with orange color that has 7 items in the cluster,.
While cluster 1 with green color is a representation of cluster 1 that amounted to 4 items,. While the
cluster 2 in blue is a representation of the 2 clusters that totaled 5 items.
Then the Result Perspective to display the processed data as a whole is complete with the classic
of the example set (read Excel). The view Data can be seen in Figure 6. That shows each cluster in
each area of the district.
Figure 6. Data View
The measurement of its parameters based on Avg. within_centroid_distance is The average within
cluster distance is calculated by averaging The distance between The centroid and all examples of a
cluster. Davies_bouldin: The algorithms that produce clusters with low intra-cluster distance (high intra-
cluster similarity) and high inter-cluster distance (low inter-cluster similarity) will have a low Davies –
Bouldin index, The clustering algorithm that Produces a collection of clusters with the smallest Davies
– Bouldin index is considered the best algorithm based on this criterion. Based on those performance
vector results The value of Davis-Bouldin =-0484 can therefore be regarded as the best algorithm based
on criteria.
5.1 Conclussion
Based on the results of the research and discussion that has been stated in the previous chapters, the
authors can conclude some of the following things:
1. In the research, clustering the case area of the animal bite (GHPR) in Palembang city, can be
done with the technique of data mining techniques CRISP-DM using the method K-means.
Where K-means trying to do clustering data with the system partition.
2. The result of data processing using RapidMiner software with the method of K-Means can be
known from 16 sub districts there are 7 sub districts can be classified as an area with high levels
of rabies case (C1) (Kertapati,SU II, Plaju, Kemuning, Kalidoni, Sako and Alang-Alang Lebar),
4 sub districts of them can be classified as an area with medium levels of rabies case (C2) (SU
I, IB I, IT II, and Sukarami), and then the other 5 sub district are classified as an area with low
levels of rabies case (C3) (IB II, Gandus, Bukit Kecil, IT I, and Sematang Borang).. With the
epidemic cluster can be intervened to a program of Palembang health service that is fast-
responsive and on target.
REFERENCES