International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print

),
ISSN 0976 - 6375(Online), Volume 5, Issue 6, June (2014), pp. 88-98 © IAEME
88











MINING SIGNATURES FROM EVENT SEQUENCES AND VISUAL
INTERACTIVE KNOWLEDGE DISCOVERY IN LARGE ELECTRONIC
HEALTH RECORD DATABASES


S.A.Sarwade
1
, Prof. R.K.Makhijani
2


1, 2
(Computer Science and Engineering Department, SSGBCOET, Bhusawal NMU(M.S.), India)



ABSTRACT

Standardization and wider use of Electronic Health records (EHR) creates opportunities for
better understanding patterns of illness and care within and across medical systems. In the healthcare
systems, hidden event signatures allow taking decision for patient’s diagnosis, prognosis, and
management. Temporal history of event codes embedded in patients' records, investigates frequently
occurring sequences of event codes across patients. There is a framework that enables the
representation, retrieval, and mining of high order latent event structure and relationships within
single and multiple event sequences. There is a wealth of hidden information present in the large
databases. Different data mining techniques can be used for retrieving data. A classifier approach for
detection of diabetes is presented in this paper and shows how Naive Bayes can be used for
classification purpose. In this system, medical data is categories into five categories namely low,
average, high and very high and critical, treatment is given as per the predicted category. The system
will predict the class label of unknown sample. Hence two basic functions namely classification
(training) and prediction (testing) will be performed. An algorithm and database used affects the
accuracy of the system. It can answer complex queries for diagnosing diabetes disease and thus assist
healthcare practitioners to make intelligent clinical decisions which traditional decision support
systems cannot.Over the last decade, so many information visualization techniques have been
developed to support the exploration of large data sets. There are various interactive visual data
mining tools available for visual data analysis. It is possible to perform clinical assessment for visual
interactive knowledge discovery in large electronic health record databases. In this paper, we
proposed that it is possible to develop a tool for data visualization for interactive knowledge
discovery.

Keywords: Data Mining, Diabetes Disease, Decision Support, Naive Bayes.


INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING &
TECHNOLOGY (IJCET)



ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 5, Issue 6, June (2014), pp. 88-98
© IAEME: www.iaeme.com/IJCET.asp
Journal Impact Factor (2014): 8.5328 (Calculated by GISI)
www.jifactor.com


IJCET
© I A E M E
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 6, June (2014), pp. 88-98 © IAEME
89

1. INTRODUCTION

In this fast moving world people want to live a very comfortable and luxurious life so they
work like a machine in orderto earn lot of money and live a comfortable lifetherefore in this race
they forget to take care ofthemselves, because of this there food habits,their entire living style
changes, Due to this type oflifestyle they are more tensed they have blood pressure, sugar at a very
young age and they don’tgive enough rest for themselves and eat what theyget.Due to this small
negligence, a major threat cause,that is diabetes. In medical organizations (hospitals, medical
centers), large amount of data is generated. Data mining is the non-trivial extraction of potential
useful information about data. Data mining techniques provide people with new powerto research
and to manipulate the existing large quantity of data. Data mining process find out
interestinginformation from the hidden data .This information can either be usedfor future prediction
and also for intelligent summarization of thedata details. Knowledge Discovery process consists of
an iterative sequence of cleaning data, data integration, data selection, and data mining, knowledge
presentation. Data mining is the search for the relationships and global patterns that exist in large
databases but are hidden among large volume of data. Many achievements of application fromdata
mining techniques to various areas such as engineering, marketing, medical, financial, and car
manufacturing are there.Thedesign and manufacturing domain is a natural candidate fordata-mining
applications because it contains extensive data.Besides enhancing innovation, data-mining methods
canreduce the risks associated with conducting business andimprove decision-making. Especially in
profiling practices such as surveillance andfraud detection, atarget dataset must be assembled before
data mining algorithms can be used. As data mining can onlyuncover patterns already present in the
data, so the target datasetmust be large enough to contain huge number of patternswhile at the same
time, remain to be concise enough to bemined in an acceptable time limit. A common source for
datais a data warehouse. Because data mart and datawarehouse are significant repository,
preprocessing isessential to perform analysis on the multivariate datasetsbefore any clustering or data
mining task is performed. Data mining tasks like clustering, association rule mining, sequence
pattern mining, and classification are used in manyapplications. Most widely used data mining
algorithmsin classification include Bayesian algorithms, Decision Trees and neural networks.
Diabetes mellitus, or simply diabetes, is a set of related diseases in which body cannot
regulate the amount of sugar level in blood. It is a group of metabolic diseases in which a person has
high blood sugar, either because the body does not produce enough insulin, or because cells do not
respond to the insulin produced. Patients with high blood sugar will typically experience polyuria
(frequent urination), they will become increasingly thirsty (polydipsia) and hungry (polyphagia).
There are three main types of diabetes.Type 1 diabetes results from the body's failure to produce
insulin, and requires the person to inject insulin or wear an insulin pump. This was previously
referred to as "insulin-dependent diabetes mellitus” or "juvenile diabetes". People usually develop
type 1 diabetes before 40th year of age or often in early adulthood or teenage years. Type 2 diabetes
results from insulin resistance, which is a condition in which cells fail to use insulin properly, also
sometimes combined with an absolute insulin deficiency. This was previously referred to as non-
insulin-dependent diabetes mellitus or "adult-onset diabetes". The third main form, gestational
diabetes occurs when pregnant women without a previous diagnosis of diabetes develop a high blood
glucose level. It may precede development of type 2 diabetes.
Diabetes is found to be one of the leading causes of global death by disease. As of 2000 it
was estimated that 171 million people globally suffered from diabetes or 2.8% of the population.
Type-2 diabetes is the most common type worldwide [2]. Figures for the year 2007 show that the 5
countries with the largest amount of people diagnosed with diabetes were India (40.9 million), China
(38.9 million), US (19.2 million), Russia (9.6 million), and Germany (7.4 million) [2].
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 6, June (2014), pp. 88-98 © IAEME
90

Data Mining refers to extracting or mining knowledge from large amounts of data. The
healthcare industry collects huge amounts of healthcare data which, unfortunately, are not “mined”
to discover hidden information for effective decision making. Discovering hidden patterns and
relationships is often difficult. Advanced data mining techniques can help remedy this situation. The
aim of data mining is to make sense of large amounts of mostly unsupervised data.Classification
maps data into predefined groups. It is also called as supervised learning as the classes are
determined prior to examining the data. In classification Algorithms the classes are defined based on
the data attribute values. They describe these classes by looking at the features of data already known
to belong to class. Pattern Recognition is a type of classification where an input pattern is classified
into one of the several classes based on its similarity to these predefined classes. Knowledge
Discovery in Databases (KDD) is the process of finding useful information and patterns in data
which involves Selection, Pre-processing, Transformation, Mining of data and Evaluation.
In this paper, we propose a Naïve Bayes based method to diagnose diabetes.The attributes
used in our proposed method are those used for diagnosis of diabetes.

2. RELATED WORK

Authors [1] developed a matrix approximation-based technology to detect the hidden
signatures from the event sequences and developed an online updating technology. This enables the
representation, extraction, and mining of latent event structure and relationships within single and
multiple event sequences. The knowledge representation maps the heterogeneous event sequences to
a geometric image by encoding events as a structured spatial-temporal shape process.
JyotiSoniet. al [3] proposed three different supervised machine learning algorithms,Naïve
Bayes, K-NN, and Decision List algorithm. These algorithms were used for analyzing the heart
disease dataset. Tanagra data mining tool is used for classifying these data. These classified data is
evaluated using 10 fold cross validation and the results are compared.
PardhaRepalli [4], in their research work predicted how likely the people with different age
groups are affected by diabetes based on their life style activities. They also found out factors
responsible for the individual to be diabetic. Statistics given by the Centers for Disease Control states
that 26.9% of the population affected by diabetes are people whose age is greater than 65, 11.8% of
all men aged 20 years or older are affected by diabetes and 10.8% of all women aged 20 years or
older are affected by diabetes.
G. Parthiban et al. [5] presents prediction of the chances of diabetic patient getting heart
disease. In this study, they applyNaïve Bayes data mining classifier technique which produces an
optimal prediction model using minimum training set. They proposed a system which predicts
attributes such as sex, age, blood pressure and blood sugar and the chances of a diabetic patient
getting a heart disease. They used Naïve Bayes Classifier. The data set used in their work was
clinical data set collected from one of the leading diabetic research institute in Chennai and contain
records of about 500 patients. The clinical data set specification provides concise, unambiguous
definition for items related to diabetes. The WEKA tool was used for Data mining.
K. Rajesh, V. Sangeetha [6], applied many classification algorithms on Diabetes dataset and
the performance of those algorithms is analyzed. This paper aims for mining the relationship in
Diabetes data for efficient classification. The data mining methods and techniques are explored to
identify the suitable methods and techniques for efficient classification of Diabetes dataset and in
mining useful patterns.
There are a large number of information visualization techniques which have been developed
over the last decade to support the exploration of large volume of data sets. The advantage of visual
data exploration is that the user is directly involved in the data mining process. Daniel A. Keim [7]
propose a classification of information visualization and visual data mining techniques which is
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 6, June (2014), pp. 88-98 © IAEME
91

based on the data type to be visualized, the visualization technique used and the interaction and
distortion technique.

3. RESEARCH OBJECTIVE

The main objective of this research is to develop a Prediction System from event sequences
using Naive Bayes algorithm of data mining. The System can discover and extracthidden knowledge
associated with diabetes disease from a historical diabetes database. It can give answer to the
complex queries fordiagnosing disease and thus assist healthcarepractitioners to make intelligent
clinical decisions. To enhance visualizationand ease of interpretation, it displays the results intabular
form and three dimensional graphical forms.

4. PROPOSED SYSTEM

The proposed system is to predict the treatment required from the attribute values of different
lab tests taken at different time for a disease. Naïve Bayes classifier technique is applied which
produces an optimal prediction model using minimum training set. Proposed system will present the
data in three dimensional formats which will be very interactive. There are various interactive visual
data mining tools available for visual data analysis. But in this system, instead of using readymade
tools, interactive visualizations are developed using Java language which will be very useful for data
analysis and predicting the results and future care of patients.

4.1. Dataset Used
Clinical databases have accumulated large quantities of information about patients and their
medical conditions. The data set used in this work contains records of about 300 patients. The
clinical data set specification provides concise, unambiguous definition for items related to diabetes.
Two Datasets are used in this project Training Dataset and Testing Dataset. Testing Dataset again
divided into two datasets, single patient’s data and multiple patient data. The records were split
equally into training dataset and testing dataset. The training dataset used for data mining
classification contains 1000 record samples, each having 13 attributes.
The diabetes attributes used in our proposed system and their descriptions are shown in Table 1.

Table 1: Diabetes Attributes Considered in the Dataset
Attribute Description
Age Age of the patient
Sex A classification of the sex of the person
HBA1C Glycated hemoglobin level that is measured primarily to identify the average
plasma glucose concentration over prolonged periods of time
Blood Pressure Blood Pressure
Plasma Glucose Glucose Level in blood
Cholesterol Total cholesterol level
Hemoglobin Level of hemoglobin in blood
Pulse Rate Number of times heart beats in one minute
Hypertension High blood pressure
Hereditary Whether disease or disorder is inherited.
Foot Ulcers sores on the feet


International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 6, June (2014), pp. 88-98 © IAEME
92

3.2 System Design
The diagrammatic representation of the proposed system design is given in Figure 1.


Figure.1: Proposed Architecture

4.3 Algorithm Used
For implementing the system, Naïve Bayes Classifier is used as a data mining algorithm.
Naïve Bayes Classifier is a term dealing with simple probabilistic classifier based on applying Bayes
Theorem with strong independence assumptions. It makes assumption that the presence or absence of
particular feature of a class is unrelated to the presence or absence of any other feature The Naive
Bayes algorithm is based on conditional probabilities. The Naïve Bayes Classifier technique is
particularly suited when the dimensionality of the inputs is large. Despite its simplicity, Naive Bayes
can often outperform more sophisticated classification methods. Naïve Bayes algorithm identifies the
characteristics of patients with diabetes disease. It shows the probability of each input attribute for
the predictable state.
The naive Bayesian classifier, or simple Bayesian classifier [8], works as follows:

1. Let D be a training set of tuples and their associated class labels. Each tuple is represented by an n-
dimensional attribute vector, X=(x1, x2,…, xn), which shows ‘n’ measurements made on the tuple
from n attributes, respectively, A1, A2,.., An.
2. Suppose that there are ‘m’ classes, C1, C2,…, Cm. For a given tuple X, the classifier will predict
that X belongs to the class having the highest probability, conditioned on X. The naïve Bayes
classifier predicts that tuple x belongs to the class Ci if and only if P (Ci|X)>P (Cj|X) for 1≤ j≤m, j
≠ i
Thus we maximize P(Ci|X). The class Ci for which P(Ci|X) is maximized is called the maximum
posteriori hypothesis. By Bayes’ theorem
P(Ci|X)=
୔ሺଡ଼|େ୧ሻ୔ሺେ୧ሻ
୔ሺଡ଼ሻ

3. As P(X) is constant for all classes, only P (X|Ci) P (Ci) need be maximized. When the class prior
probabilities are not known, it is commonly assumed that the classes are equally likely, that is,
P(C1)=P(C2) =…=P(Cm), so we would therefore maximize P(X|Ci). Otherwise, we maximize
P(X|Ci)P(Ci). Note that the class prior probabilities may be estimated by P(Ci)=|Ci,D|/|D|, where
|Ci,D| is the number of training tuples of class Ci in D.
4. If a given data sets have many attributes, it becomes extremely computationally expensive to
compute P(X|Ci). In order to reduce computation in evaluating P(X|Ci), the naïve assumption of
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 6, June (2014), pp. 88-98 © IAEME
93

class conditional independence is made. This asumes that the values of the attributes are
conditionally independent of one another, given the class label of the tuple (i.e., that there are no
dependence relationships among the attributes). Thus,


=P(x1|Ci)x P(x2|Ci)x… P(xm|Ci).
We can easily estimate the probabilities P(x1|Ci), P(x2|Ci),… ,P(xm|Ci) from the training tuples.
Here xk refers to the value of attribute Ak for tuple X.
5. In order to predict the class label of X, P(X|Ci)P(Ci) is evaluated for each class Ci and the
classifier predicts that the class label of tuple X is the class Ci if and only if
P(X|Ci)P(Ci)>P(X|Cj)P(Cj) for 1 ≤ j ≤ m, j ≠ i
In other words, the predicted class label is class Ci for which P(X|Ci)P(Ci) is the maximum.

4.3.1An Example
The following example is a simple demonstration of applying the Naïve Bayes Classifier.
This example [8] shows how to calculate the probability using Naïve Bayes classification algorithm.

Table 2: Class-Labeled Training Tuples from theElectronics Customer Database

Predicting a class label using naïve Bayes algorithm, we wish to predict the class label of a
tuple using naive Bayesian classification from the training data as in the above table. The data tuples
are described by the attributes age, income, student and credit rating.
The class label attribute, Can_Buy_computer, has two distinct values (namely, {yes, no}).
Let
C1 correspond to the class Can_Buy_computer=yes and
C2 correspond to Can_Buy_computer=no.
The tuple we wish to classify is
X = (Cust_Age=youth, Cust_Income=medium, Is_Student=yes, Credit_Rating=fair)
We need to maximize P(X|Ci)P(Ci), for i=1, 2. P(Ci), the prior probability of each class, is
computed based on the training tuples:
Cust_ID Cust_Age Cust_Income Is_Student Credit_Rating CLASS
Can_Buy_computer
1 Youth High No Fair No
2 Youth High No Excellent No
3 Middle_Aged High No Fair Yes
4 Senior Medium No Fair Yes
5 Senior Low Yes Fair Yes
6 Senior Low Yes Excellent No
7 Middle_Aged Low Yes Excellent Yes
8 Youth Medium No Fair No
9 Youth Low Yes Fair Yes
10 Senior Medium Yes Fair Yes
11 Youth Medium Yes Excellent Yes
12 Middle_Aged Medium No Excellent Yes
13 Middle_Aged High Yes Fair Yes
14 Senior Medium No Excellent No
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 6, June (2014), pp. 88-98 © IAEME
94

P(Can_Buy_computerter=yes) = 9/14=0.643
P(Can_Buy_computer=no) = 5/14=0.357
To compute P(X|Ci), for i=1, 2, we compute the following conditional probabilities:
P(Cust_Age=youth|Can_buys_computer=yes) =2/9=0.222
P(Cust_Age=youth|Can_buys_computer=no) =3/5=0.600
P(Cust_Income=medium|Can_buys_computer=yes) =4/9=0.444
P(Cust_Income=medium|Can_buys_computer=no) =2/5=0.400
P(Is_Student=yes|Can_buys_computer=yes) =6/9=0.667
P(Is_Student=yes|Can_buys_computer=no) =1/5=0.200
P(Credit_Rating=fair|Can_buys_computer=yes) =6/9=0.667
P(Credit_Rating=fair|Can_buys_computer=no) =2/5=0.400
Using the above probabilities, we obtain
P(X|Can_Buy_computer=yes)=P(Cust_Age=youth|buys_computer=yes)xP(Cust_Income
medium|buys_computer=yes)xP(Is_Studentyes|buys_computer=yes)
xP(Credit_Rating=fair|buys_computer=yes)
=0.222 x 0.444 x 0.667 x 0.667=0.044
Similarly,
P(X|Can_Buy_computer=no) = 0.600 x 0.400 x 0.200 x 0.400 = 0.019.
To find the class, Ci, that maximizes P(X|Ci)P(Ci), we compute
P(X|Can_Buy_computeruter=yes) P(Can_Buy_computer=yes)=0.044 x 0.643 = 0.028
P(X|Can_Buy_computer=no) P(Can_Buy_computer=no) =0.019 x 0.357 = 0.007
Therefore, the naïve Bayesian classifier predicts
Can_Buy_computer= yes for tuple X.

4.3.2 Implementation on patient data
Naïve Byes algorithm calculates the probability of each attribute of patient’s record. Then it
calculates Yes or No probability and gives the severity of the disease as shown in figure 2. It learns
from the “evidence” by calculating the correlation between the target (i.e., dependent) and other (i.e.,
independent) variables.
















Figure.2: Implementation of Naïve Bayes on the patient’s Data



Calculate Yes or No probability
Data Set Enter patient record
Naïve Byes
Calculate probability of each attribute Tell about the risk
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 6, June (2014), pp. 88-98 © IAEME
95

4.4 Implementation of the System
The training dataset is given as input to classifier.This classified data is used for
testingpurpose. We have used algorithm Naive Bayes. System will work in three phases: Training
phase, Testing phase and Visualization.

4.4.1 Training Phase: Classification assumes labeled data. We know how many classes are there
and we have examples for each class (labeled data). Classification is supervised. Classifies data
(constructs a model) based on the training set and the values (class labels) ina classifying attribute
and uses it in classifying new data.

4.4.2 Testing Phase: Testing phase involves the prediction of unknown data sample. In testing, we
check those data that doesnot come under the dataset we have considered. After the prediction, we
will get the class labels.

4.4.3 Visualization phase: Graphing and visualization tools are a vital aid in data preparation and
their importance to effective data analysis cannot be overemphasized. Data visualization provides the
featuresleading to new insights and success. Data is visualized in the form of 3D pie charts, event
chart and bar charts. By selecting the visualization parameter and visualization type, user can see the
three dimensional graph of required parameter which will be very useful for the analysis of disease
and for future care of patients.

5. RESULTS AND ANALYSIS

The final output is to find out whether the person is affected with Diabetes or not and its
severity and treatment according to that severity. Results and analysis is done on health record
dataset.


Figure.2 Diagnosis and Treatment Prediction for Single Patient

After importing the Test dataset, a table is displayed which shows the prediction of disease
with its severity and treatment required according to that severity as shown in Fig2. Fig2 shows the
result of the lab tests conducted at different event for a single patient. Also it shows the prediction of
severity and treatment required for that severity of disease.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 6, June (2014), pp. 88-98 © IAEME
96


Figure.3 Pie Chart for Parameters AgeFigure.4 Bart Chart for Parameters BloodPressure

Fig.3 shows the distribution of diabetic patients with different Ages such as youth, middle
aged and senior by selecting the visualization parameter as Age.We can again explore pie chart
which will display other parameter representation inside age values. It is also possible to display Pie
chart showing distribution of diabetic patients by gender, blood pressure, hypertension or any other
visualization parameter.Figure 4 shows the 3D Bar Chart for visualization parameter blood pressure.
3D Bar chart shows the blood pressure values i.e. normal, mild, and highon X-Axis and probability
of having diabetes on Y-Axis. This type of bar chart can be displayed for other visualization
parameters also. This graphical interactive visualization becomes very useful for the analyst to
design various healthcare systems for the patients. It will be helpful to analyze the things like the
percentage of youth having diabetes, percentage of people with high blood pressure having diabetes.
So it becomes very easy and clear to analyst and it will be helpful to arrange awareness programs for
the selected type of patients.
Table 3 shows the accuracy of the system obtained by changing the number of instances in the esting
dataset.

Table 3: Accuracy (%)
No. of
Records in
Training
Datasets
No. of
Records in
Testing
Datasets
No. of Correctly
classified
instances
No. of Incorrectly
classified
instances
Accuracy
(%)
909 301 266 35 88.37
909 250 215 35 86
909 200 179 21 89.5

Fig.5 shows the classified data according to treatment in the form of Pie chart. In this graph
distribution of treatment required to the patients is shown. For example there are 18% of patients in
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 6, June (2014), pp. 88-98 © IAEME
97

dataset requiring Random Plasma Glucose Test. Fig 6 show the correctly and wrongly classified
records in the dataset.


Figure.5: Classified Data and Treatment Required Figure.6: Accuracy of the System


6. CONCLUSION AND FUTURE WORK

Integration of visualization techniques and more established methods combines fast
automatic data mining algorithms with the intuitive power of the human, which improve the quality
and speed of the data mining process. Decision Support from event sequences in diabetes Disease
Prediction System is developed using Naive Byes data mining techniques. The Disease diagnosis
systemextracts hidden knowledge from a historical diabetes disease database.This is the most
effective model to predict treatment required for the patients with disease.This system answersthe
complex queries, each with its ownstrength with respect to ease of model interpretation, access
toinformation and accuracy. The system is useful to guide diabetic patients during the disease.
Diabetes patients could benefit from the diabetes monitoring system. The diabetes diagnosis system
is not only for a diabetic patient, but also for the people who suspect if they are diabetic.
Disease Prediction from event sequences and visualization System can be expandedfor other
diseases HIV, Lung cancer, Breast cancer and Stomach cancer also.It can be further enhanced and
expanded. It can also include other different data mining techniques. Also instead of just categorical
data, continuous data can be used.

REFERENCES

[1] Fei Wang, Noah Lee, Jianying Hu, Jimeng Sun, ShahramEbadollahi and Andrew F. Laine,”A
Framework For Mining Signatures From Event Sequences and iIts Applications in Healthcare
Data”, IEEE Transactions On Pattern Analysis And Machine Intelligence, Vol. 35, No. 2,
February 2013.
[2] http://diabetes.co.in.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 6, June (2014), pp. 88-98 © IAEME
98

[3] JyotiSoni, Ujma Ansari, Dipesh Sharma, SunitaSoni, “Predictive Data Mining for Medical
Diagnosis: An Overview of Heart Disease Prediction”, IJCSE Vol. 3 No. 6 June 2011.
[4] PardhaRepalli, “Prediction on Diabetes Using Data mining Approach”.
[5] G. Parthiban, A. Rajesh, S.K.Srivatsa, “Diagnosis of Heart Disease for Diabetic Patients
using Naive Bayes Method”, International Journal of Computer Applications (0975 – 8887)
Volume 24– No.3, June 2011.
[6] K. Rajesh, V. Sangeetha, “Application of Data Mining Methods and Techniques for Diabetes
Diagnosis”, International Journal of Engineering and Innovative Technology (IJEIT) Volume
2, Issue 3, September 2012
[7] Daniel A. Keim, “Information Visualization and Visual Data Mining”, IEEE Transactions on
Visualization and Computer Graphics, Vol. 7, No. 1, January-March 2002.
[8] Mrs.G.Subbalakshmi, Mr. K. Ramesh, Mr. M. ChinnaRao,” Decision Support in Heart
Disease Prediction System using Naive Bayes”, Indian Journal of Computer Science and
Engineering (IJCSE), Vol. 2 No. pp.170-176, 2 Apr-May 2011,
[9] SellappanPalaniappan, RafiahAwang, “Intelligent Heart Disease Prediction System Using
Data Mining Techniques”, 978-1-4244-1968- 5/08/$25.00 ©2008 IEEE.
[10] Mai Shouman, Tim Turner, Rob Stocker, “Using data mining techniques in heart disease
diagnosis and treatment”, Japan.Egypt Conference on electronics, Communications and
Computers 978-1-4673-0483-2 c_2012 IEEE.
[11] N. Aaditya Sunder, P. PushpaLatha, “Performance analysis of classification data mining
techniques over heart diseasedatabase” International Journal of Engineering Science and
Advance Technology”-vol-2 issue-3,470-478,May-June 2012.
[12] R. Bhuvaneswari and K. Kalaiselvi, “Naive Bayesian Classification Approach in Healthcare
Applications”, International Journal of Computer Science and Telecommunications,
Volume 3, Issue 1, January 2012.
[13] Rinal H. Doshi, Dr. Harshad B. Bhadka and Richa Mehta, “Development of Pattern
Knowledge Discovery Framework using Clustering Data Mining Algorithm”, International
Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 3, 2013,
pp. 101 - 112, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.
[14] Chaitrali S. Dangare and Dr. Sulabha S. Apte, “A Data Mining Approach for Prediction of
Heart Disease using Neural Networks”, International Journal of Computer Engineering &
Technology (IJCET), Volume 3, Issue 3, 2012, pp. 30 - 40, ISSN Print: 0976 – 6367,
ISSN Online: 0976 – 6375.
[15] Asst. Prof. Jameelah H. Suad and Wurood A. Jbara, “Subjective Quality Assessment of New
Medical Image Database”, International Journal of Computer Engineering & Technology
(IJCET), Volume 4, Issue 5, 2013, pp. 155 - 164, ISSN Print: 0976 – 6367, ISSN Online:
0976 – 6375.
[16] Faimida M. Sayyad, “Proposed Remote Healthcare System for Rural Development”,
International Journal of Information Technology and Management Information Systems
(IJITMIS), Volume 4, Issue 1, 2013, pp. 16 - 23, ISSN Print: 0976 – 6405, ISSN Online:
0976 – 6413.
[17] P.N.Santosh Kumar, Dr. C.Venugopal and Dr. C.Sunil Kumar, “Applications of Data Mining
in Medical Databases”, International Journal of Computer Engineering & Technology
(IJCET), Volume 4, Issue 6, 2013, pp. 284 - 289, ISSN Print: 0976 – 6367, ISSN Online:
0976 – 6375.