You are on page 1of 17

Wollega university Department of Informatics

No Authors Title Journal Name Volume and


Date
Advances in Data International
1 Rakhi Ray Mining Healthcare Research Journal of Volume: 05 Issue:
Applications Engineering and 03 | Mar-2018
Technology (IRJET)

Data Mining
2 Olegas NIAKŠU , Applications in Not defined Not defined
Olga KURASOVA Healthcare: Research
vs Practice

A Comprehensive International Journal


3 Sandeep Review of Current of Engineering Volume 38
Kautish,Rana and Future Trends and Number 2- August
Khudhair Abbas Applications of Data Technology (IJETT 2016
Ahmed Mining in Medicine
& Healthcare

International Journal
4 Dimple A Review on Data in Multidisciplinary Vol. 3, No. 1,
Mining Techniques and Academic February- March
Used in Healthcare Research -2014 (ISSN 2278
Industry (SSIJMAR) – 5973)

Applied
5 Abd Elrazek Abd How Can Data Mathematics & No. 2, 585-588
Elrazek Mining Improve Information Sciences (2017)
Health Care? An International
Journal

Reference List Abbreviation

Page 1
Wollega university Department of Informatics

List Abbreviation

SVM : Support Vector Machine

K-NN : K-Nearest Neighbor

DB : Data base

DM : Data mining

KDD : Knowledge Discovery data

Page 2
Wollega university Department of Informatics

ABSTRACT

Data mining is an efficient method to find the appropriate data from the database. Data mining is
significantly applied to medicine for the diagnosis of several diseases such as skin cancer, breast
cancer, lung cancer, diabetes, liver disorder, heart disease, kidney failure, kidney stone, hepatitis
etc. Different new technologies are inventing to examine physical conditions and finding
symptoms of the different disease. Data mining intelligent technology to improve health care
systems in a way saving time, effort and money and improve overall medical care systems. This
paper mainly illustrates technique used in data mining in each disease.

Page 3
Wollega university Department of Informatics

1. Data Mining Techniques in health care.

1.1 Introduction

The healthcare domain is known for its ontological complexity and variety of medical data
standards and variable data quality. Healthcare is a booming sector of the economy in many
countries with its growth, come challenges including rising costs, inefficiencies, poor quality,
and increasing complexity. Performance measurement and reporting has now become common
place in most health care settings. A large volume of data is collected through this system on a
regular basis machine learning can estimate/ evaluate the planning system for the health care
quality here we will shed light the importance of such data mining machine learning . Analytics
provides tools and techniques to Healthcare extract information from this complex and
voluminous data and translate it into information to assist decision-making in healthcare. In this
paper, a survey on data mining for the application on healthcare sector is discussed. It is seen that
several studies are available on data mining for healthcare application. After reviewing different
studies, it is found that several medical related applications using data mining techniques such as
hospital management, pharmaceutical industries, and medical device industries, etc. This paper
aims to provide an overview of the data mining techniques starting from its definition, to the
application.

Page 4
Wollega university Department of Informatics

1.2 Data Mining Tools and Techniques in health care.

1. Classification

Classification is one of the most popularly used methods of Data Mining in Healthcare sector. It
divides data samples into target classes. The classification technique predicts the target class for
each data points. With the help of classification approach a risk factor can be associated to
patients by analyzing their patterns of diseases

Binary and multilevel are the two methods of classification. In binary classification, only two
possible classes such as, “high” or “low” risk patient may be considered while the Multiclass
approach has more than two targets for example, “high”, “medium” and “low” risk patient.

1.1 K-Nearest Neighbor (K-NN)

K-Nearest Neighbor (K-NN) classifier is one of the simplest classifier that discovers the
unidentified data point using the previously known data points (nearest neighbor) and classified
data points according to the voting system.

Very important role in order to classify any new instance. It is one of the most simple data
mining techniques. It is mainly known as Memory-based classification because at run time
training .examples must always be in memory. K-NN has a number of applications in different
areas such as health datasets, image field, cluster analysis, pattern recognition, online marketing
etc.

There are various advantages of KNN classifiers.

 Ease  Competitive
 Efficacy  Robust to noisy training data.
 Intuitiveness

Disadvantage of KNN classifiers

Large memory requirement needed to store the whole sample. If there is a big sample then its
response time on a sequential computer will also large.

Page 5
Wollega university Department of Informatics

1.2 Decision Tree (DT)

DT is considered to be one of the most popular approaches for representing classifier. We can
construct a decision tree by using available data which can deal with the problems related to
various research areas. It is equivalent to the flowchart in which every non-leaf nodes denotes a
test on a particular attribute and every branch denotes an outcome of that test and every leaf node
have a class label. Root node is the top most node of a decision tree. For example, with the help
of medical readmission decision tree we can decide whether a particular patient requires
readmission or not. Knowledge of domain is not required for building decision regarding any
problem.

Use of Decision Tree

 Analysis for calculating conditional probabilities.


 Choose best alternative and traversal from root to leaf indicates unique class separation
based on maximum information gain.

Advantages:

 Decision trees are self–explanatory and when compacted they are also easy to follow.
 Even set of rules can also be constructed with the help of decision trees.
 Capable to handle both types of attributes, nominal as well as numeric input attributes.
 If any datasets have missing or erroneous values, such type of datasets can be easily
handled by decision trees.

Disadvantages

 Most of the algorithms (like ID3 and C4.5) require that the target attributes have only
discrete values because decision trees use the divide and conquer method.
 If there are more complex interactions among attributes exist then performance of
decision trees is low.

Page 6
Wollega university Department of Informatics

1.3 Support Vector Machine (SVM)

The main aim of creating hyper plane by SVM in order to separate the data points.

Advantages

 It is effective in high dimensional spaces.


 Effective in cases where number of dimensions is greater than the number of samples.
 memory efficient because it uses a subset of training points in the decision function
 Versatile because different kernel functions can be specified for the decision function.

Disadvantages

 First one is that, if the number of features is much greater than number of samples then
SVM is more likely to give poor performances.
 It does not directly provide probability estimates.

1.4 Neural Network (NN)

Classification algorithms used in various biomedicine and healthcare fields. For example, NN
has been widely used as the algorithm supporting the diagnosis of diseases including cancers and
predict outcomes. In NN, basic elements are neurons or nodes. These neurons are interconnected
and within the network they worked together in parallel in order to produce the output functions.
The basic property of an NN is that it can minimize the error by adjusting its weight and by
making changes in its structure. It minimizes the error only due to its adaptive nature. NN are
capable to produce predictions of greater accuracy.

Advantages

 properly handle noisy data for training


 Reasonably classify new types of data that is different from training data.

Disadvantages

Page 7
Wollega university Department of Informatics

 First is that it requires many parameters, including the optimum number of hidden layer
nodes that are empirically determined, and its classification performance is very sensitive
to the parameters selected.
 Or learning process is very slow and computationally very expensive.
 Do not provide any internal details regarding to that phenomena which is currently under
investigation.

1.5 Bayesian Methods

For probabilistic learning method Bayesian classification is used. With the help of classification
algorithm we can easily obtained it . Bayes theorem of statistics plays a very important role in it.
While in medical domain attributes such as patient symptoms and their health state are correlated
with each other but Naïve Bayes Classifier assumes that all attributes are independent with each
other.

Disadvantage

If attributes are independent with each other than Naïve Bayesian classifier has shown great
performance in terms of accuracy.

Advantages

 Helps to makes computation process very easy.


 it has better speed and accuracy for huge datasets.

2. Regression

Regression is very important technique of data mining. With the help of it, we can easily identify
those functions that are useful in order to demonstrates the correlation among different various
variables. Regression is mostly used in order to inspect the certain relationship between
variables.

3. Clustering

The main task of unsupervised learning method means clustering method is to form the clusters
from large database on the basis of similarity measure. The goal of clustering is to discover a

Page 8
Wollega university Department of Informatics

new set of categories, the new groups are of interest in themselves, and their assessment is
intrinsic.

3.1 Partitional Clustering

Partitional algorithms are categorized according to how they relocate objects, how they select a
cluster centroid (or representative) among objects within a (incomplete) cluster, and how they
measure similarities between objects and cluster centroids.

Advantage

 Handle large data sets which hierarchal algorithms cannot.


 Quickly cluster data.

Drawback

 Clustering results depend on the initial cluster centroids to some degree because the
centroids are randomly selected.
 Each time when partitional algorithms run different clustering results are obtained.

3.2 Hierarchical Clustering

Data points can be partitioned in a tree way known as hierarchical way by using either top down
or bottom up approaches. Divisive approach initially takes this single group and iteratively
partitioned it into smaller group until and unless each data point relates to one and only one
cluster.

The single-link clustering algorithm:

Select the closet pair of objects from two groups and measure the similarity between objects as
group similarity.

The complete-link algorithm

Calculates the similarity between the most distant pair of objects from two groups.

The average-link algorithm

Page 9
Wollega university Department of Informatics

Selects all pairs of objects from two groups and averages all possible distances between objects.
Among the various hierarchical algorithms the average-link algorithm provides the best accuracy
in most cases.

Advantage

 Visualization capability that shows how much objects in the data set are similar one
another.

3.3 Density Based Clustering

Density based clustering methods play a very important role in biomedical research because they
are capable of handle any cluster of arbitrary shape. : DBSCAN, OPTICS, and DENCLUE used
density based clustering approach in order to obtain the useful patterns from a very large
biomedical images database.

4. Apriori Algorithm

Apriori algorithm used to discover frequent diseases in medical data. This study proposed a
method for detecting the occurrence of diseases using Apriori algorithm in particular
geographical locations at particular period of time. The Apriori algorithm requires two user
inputs: first one is support because users are interested in association rules (sets of transactions)
that frequently occur in a database and second is confidence (as percentages) measure of
accuracy. e.g., if male breast cancer cases are not frequent, no association rules related to the
disease are generated.

Page 10
Wollega university Department of Informatics

1.2 Key Properties of Data Mining techniques


 Predicts the target class for each data points.
 Discover frequent diseases in medical data.
 Inspect the certain relationship between variables.
 Obtain the useful patterns from a very large biomedical images database.
 Examine patient symptoms and their health states are correlated.

1.3. Data mining application in healthcare

 Healthcare Management and Inpatient Length of Stay Prediction


 Effective Treatment and Diagnosis
 Detection of Abuse and Fraud
 Customer Relationship Management
 pharmaceutical industries
 Medical device industries.

1.4 Statement of the problem


Paper identified most commonly used Data mining techniques, applications and Data analysis
methods. Make successful decisions that will improve success of healthcare organization and

Page 11
Wollega university Department of Informatics

health of the patients. Using data mining technique to reduce cost and simultaneously service
quality improvement.

1.5 Data mining challenges in healthcare.

 There is no standard format is laid down for data being stored.


 Data sharing is another major challenge. Neither patients nor healthcare organizations are
interested in sharing of their private data.
 Build the data warehouse where all the healthcare organizations within a country share
their data is very costly and time consuming process.

1.6 Research Questions


What new technologies are inventing to examine different disease?

Which technique mines data effectively from a huge database?

Various applications areas of health care

Which algorithm is suitable for which diesis?

What are the goals of articles?

2. Methodology

The methodology of this research is using different data mining techniques for health care and
medical. These techniques are Decision tree, neural network, Naïve Bayes, Artificial Neural
Network (ANN), Healthcare Database, Diagnosis, and Clinical Decision Support Systems
(CDSS).Health care carried out using those data mining techniques for the diagnosis and
prognosis of various diseases.

3. Literature Review

Page 12
Wollega university Department of Informatics

There are different kinds of studies for DM techniques in medical databases. Studies that
summarize reviews and challenges in mining medical data in general Studies of DM techniques
used for diagnosing and prognosing of specific diseases. Studies that present new frameworks,
tool and applications in medicine and healthcare system.

4. Results
In healthcare a large amount of data is available, and data mining technique can be used to
extract different hidden information for the public healthcare data The outcome of data mining
can be beneficial to healthcare practitioners for making intelligent clinical decisions, which
would be better than the traditional decision support systems. The treatment cost can be
minimized by providing effective treatments on time. The huge potential of data mining can be
grouped in different ways. For example, healthcare management and inpatient length of stay
prediction, effective treatment and diagnosis, detection of abuse and fraud, and relationship
management. There are also some specialized data mining in medical technology including DNA
micro-array analysis and medicine prediction.

4.1 Key finding

 Reduce large amount of health care data.


 An efficient method to find the appropriate data from the database is required.
 Healthcare analytics using data mining and big data.
 Reduce cost and simultaneously service quality improvement

Page 13
Wollega university Department of Informatics

5. Conclusion
Nowadays various organizations are using data mining technique to reduce cost and
simultaneously service quality improvement. Among this health care is one of them data mining
may bring significant benefit to the healthcare sector. The benefits not only include prediction of
medical condition using the previous history of a patient from the database but also hospital
management systems such as emergency division. New technologies Algorithms would be
helpful in endemic areas to control diseases spreading and save health care workers. Combining
more than one data mining technique for diagnosing or predicting diseases could yield more
promising results .Data mining is the breakthrough in computational analysis over mathematical
and statistical analyses in space, communication technology, labor, biology, industry,
engineering and different computer sciences, recently in medicine, that the use of data mining in
prediction medicine should discover the important factors related-disease conditions by
extracting hidden factors that have been never identified by the usual statistical programs,
impacting both cost and quality care systems.

Page 14
Wollega university Department of Informatics

Acknowledgment

The process of writing this literature report would have not been completed without the support
and assistance of individuals. First of all I would like to thank the Almighty God for His wisdom
and courage best owed upon me during this work. Next I would like to thank my advisors Dr
Anita (PhD) and authors who add different idea to review paper. Finally thanks to all who were
ready and eager to help me and for their constructive comments at all steps of the work.

Page 15
Wollega university Department of Informatics

References
[1] Rakhi Ray”Advances in Data Mining: Healthcare Applications” Volume: 05 Issue: 03 |
Mar-2018

[2] Olegas NIAKŠU , Olga KURASOVA”Data Mining Applications in Healthcare: Research vs


Practice” Vilnius University, Institute of Mathematics and Informatics, Akademijos str. 4, LT-
08663, Vilnius, Lithuania

[3] Sandeep Kautish,Rana Khudhair Abbas Ahmed “A Comprehensive Review of Current and
Future Applications of Data Mining in Medicine & Healthcare”Volume 38 Number 2- August
2016

[4] Md Saiful Islam 1 , Md Mahmudul Hasan 1 , Xiaoyi Wang 1”A Systematic Review on
Healthcare Analytics: Application and Theoretical Perspective of Data Mining” : 23 May 2018

[5] Abd Elrazek Abd Elrazek”How Can Data Mining Improve Health Care?”No. 2, 585-588
(2017)

Page 16
Wollega university Department of Informatics

Page 17

You might also like