Professional Documents
Culture Documents
INTRODUCTION
In day to day life many factors that affect a human heart. Many problems are occurring at
a rapid pace and new heart diseases are rapidly being identified. In today’s world of stress Heart,
being an essential organ in a human body which pumps blood through the body for the blood
circulation is essential and its health is to be conserved for a healthy living. The health of a
human heart is based on the experiences in a person’s life and is completely dependent on
professional and personal behaviors of a person.
There may also be several genetic factors through which a type of heart disease is passed
down from generations. According to the World Health Organization, every year more than 12
million deaths are occurring worldwide due to the various types of heart diseases which is also
known by the term cardiovascular disease. The term Heart disease includes many diseases that
are diverse and specifically affect the heart and the arteries of a human being. Even young aged
people around their 20-30 years of lifespan are getting affected by heart diseases.
The increase in the possibility of heart disease among young may be due to the bad eating
habits, lack of sleep, restless nature, depression and numerous other factors such as obesity, poor
diet, family history, high blood pressure, high blood cholesterol, idle behavior, family history,
smoking and hypertension. The diagnosis of the heart diseases is a very important and is itself
the most complicated task in the medical field. All the mentioned factors are taken into
consideration when analyzing and understanding the patients by the doctor through manual
check-ups at regular intervals of time. The symptoms of heart disease greatly depend upon
which of the discomfort felt by an individual. Some symptoms are not usually identified by the
common people.
However, common symptoms include chest pain, breathlessness, and heart palpitations.
The chest pain common to many types of heart disease is known as angina, or angina pectoris,
and occurs when a part of the heart does not receive enough oxygen. Angina may be triggered by
stressful events or physical exertion and normally lasts under 10 minutes. Heart attacks can also
occur as a result of different types of heart disease. The signs of a heart attack are similar to
angina except that they can occur during rest and tend to be more severe. The symptoms of a
heart attack can sometimes resemble indigestion.
Heartburn and a stomach ache can occur, as well as a heavy feeling in the chest. Other
symptoms of a heart attack include pain that travels through the body, for example from the chest
to the arms, neck, back, abdomen, or jaw, lightheadedness and dizzy sensations, profuse
sweating, nausea and vomiting.
Heart failure is also an outcome of heart disease, and breathlessness can occur when the
heart becomes too weak to circulate blood. Some heart conditions occur with no symptoms at all,
especially in older adults and individuals with diabetes. The term 'congenital heart disease'
covers a range of conditions, but the general symptoms include sweating, high levels of fatigue,
fast heartbeat and breathing, breathlessness, chest pain. However, these symptoms might not
develop until a person is older than 13 years. In these types of cases, the diagnosis becomes an
intricate task requiring great experience and high skill. A risk of a heart attack or the possibility
of the heart disease if identified early, can help the patients take precautions and take regulatory
measures. Recently, the healthcare industry has been generating huge amounts of data about
patients and their disease diagnosis reports are being especially taken for the prediction of heart
attacks worldwide. When the data about heart disease is huge, the machine learning techniques
can be implemented for the analysis.
Data Mining is a task of extracting the vital decision making information from a
collective of past records for future analysis or prediction. The information may be hidden and is
not identifiable without the use of data mining. The classification is one data mining technique
through which the future outcome or predictions can be made based on the historical data that is
available. The medical data mining made a possible solution to integrate the classification
techniques and provide computerized training on the dataset that further leads to exploring the
hidden patterns in the medical data sets which is used for the prediction of the patient’s future
state. Thus, by using medical data mining it is possible to provide insights on a patient’s history
and is able to provide clinical support through the analysis. For clinical analysis of the patients,
these patterns are very much essential. In simple English, the medical data mining uses
classification algorithms that are a vital part for identifying the possibility of heart attack before
the occurrence.
The classification algorithms can be trained and tested to make the predictions that
determine the person’s nature of being affected by heart disease. In this research work, the
supervised machine learning concept is utilized for making the predictions. A comparative
analysis of the three data mining classification algorithms namely Random Forest, Decision Tree
and Naïve Bayes are used to make predictions. The analysis is done at several levels of cross
validation and several percentage of percentage split evaluation methods respectively.
The StatLog dataset from UCI machine learning repository is utilized for making heart
disease predictions in this research work. The predictions are made using the classification model
that is built from the classification algorithms when the heart disease dataset is used for training.
This final model can be used for prediction of any types of heart diseases.
CHAPTER 2
LITERATURE SURVEY
2.1 PREDICTING SURVIVAL CAUSES AFTER OUT OF HOSPITAL CARDIAC
ARREST USING DATA MINING METHOD
AUTHORS
FRANCK LE DUFF
CRISTIAN MUNTEAN
MARC CUGGIA
PHILIPPE MABO
In this paper [1] the authors stated that the prognosis of life for patients with heart failure
remains poor. By using data mining methods, the purpose of this study was to evaluate the most
important criteria for predicting patient survival and to profile patients to estimate their survival
chances together with the most appropriate technique for health care. METHODS: Five hundred
and thirty three patients who had suffered from cardiac arrest were included in the analysis.
They performed classical statistical analysis and data mining analysis using mainly
Bayesian networks. RESULTS: The mean age of the 533 patients was 63 (± 17) and the sample
was composed of 390 (73 %) men and 143 (27 %) women. Cardiac arrest was observed at home
for 411 (77 %) patients, in a public place for 62 (12 %) patients and on a public highway for60
(11 %) patients. The belief network of the variables showed that the probability of remaining
alive after heart failure is directly associated to five variables: age, sex, the initial cardiac rhythm,
the origin of the heart failure and specialized resuscitation techniques employed.
CONCLUSIONS: Data mining methods could help clinicians to predict the survival of
patients and then adapt their practices accordingly. This work could be carried out for each
medical procedure or medical problem and it would become possible to build a decision tree
rapidly with the data of a service or a physician. The comparison between classic analysis and
data mining analysis showed us the contribution of the data mining method for sorting variables
and quickly conclude on the importance or the impact of the data and variables on the criterion
of the study. The main limit of the method is knowledge acquisition and the necessity to gather
sufficient data to produce a relevant model.
Cardiac arrest is defined as a spontaneous irreversible arrest of the general circulation by
cardiac inefficacity. It is recognized with the absence of the femoral pulse for more than 5
seconds. Without resuscitation, cardiac arrest leads to sudden cardiac death. The public health
impact of sudden cardiac death is heavy since the survival rate is estimated at between 1 and 20
% for cardiac arrest patients. This represents 300,000 to 400,000 deaths a year in the United
States and 20,000 to 30,000 deaths in France. The profile of the patient is now well known since
it generally concerns men from about 45 to 75 years. Hospitalization must be optimal and fast.
According to the type of cardiac attack, the procedure of assumption of responsibility can vary
and some studies [11] show the interest of various techniques over others, according to the cause
of the cardiac arrest. Heart disease is the number one cause of death in the U.S. Ac-cording to the
American Heart Association, an estimated 1,100,000 people in the U.S. will have a coronary
attack each year.
Ninety five percent of sudden cardiac arrest victims die before reaching the hospital and
heart disease claims more lives each year than the following six leading causes of death
combined (cancer, chronic lower respiratory diseases, accidents, diabetes mellitus, influenza and
pneumonia). Almost 150,000 people in the U.S. who die from heart disease each year are under
the age of 65. These data show the interest for predicting the risk of death after heart failure and
the need to analyze the events that occurred during care to provide prognostic information.
Classic statistical analyses have already been done and provide some information about
epidemiology of the heart failure and causes of the failure. This paper presents the use of a
probability in a statistical approach in order to profile heart failure in a sample of patients and
predict the impact of some events in the care process.
They concluded that it seems that the use of the Bayesian network in medical analysis
could be useful to explore data and to find hid-den relationships between events or characteristics
of the sample. It is a first approach for discussing hypotheses between clinicians and statistical
experts. The main limit of these tools is the necessity to have enough data to find regularity in
the relationships
2.2 KNOWLEDGE DISCOVERY IN DATABASES: AN OVERVIEW
AUTHORS
WILLIAM J. FRAWLEY
GREGORY PIATETSKY-SHAPIRO
CHRISTOPHER J. MATHEUS
In this paper [2] the authors stated that after a decade of fundamental interdisciplinary
research in machine learning, the spadework in this field has been done; the 1990s should see the
widespread exploitation of knowledge discovery as an aid to assembling knowledge bases. The
contributors to the AAAI Press book Knowledge Discovery in Databases were excited at the
potential benefits of this research. The editors hope that some of this excitement will
communicate itself to AI Magazine readers of this article.
It has been estimated that the amount of information in the world doubles every 20
months. The size and number of databases probably increases even faster. In 1989, the total
number of databases in the world was estimated at five million, although most of them are small
DBASE III databases. The automation of business activities produces an ever-increasing stream
of data because even simple transactions, such as a telephone call, the use of a credit card, or a
medical test, are typically recorded in a computer.
Scientific and government databases are also rapidly growing. The National Aeronautics
and Space Administration already has much more data than it can analyze. Earth observation
satellites, planned for the 1990s, are expected to generate one terabyte (1015 bytes) of data every
day— more than all previous missions combined. At a rate of one picture each second, it would
take a person several years (working nights and weekends) just to look at the pictures generated
in one day. In biology, the federally funded Human Genome project will store thousands of bytes
for each of the several billion genetic bases.
Closer to everyday lives, the 1990 U.S. census data of a million million bytes encode
patterns that in hidden ways describe the lifestyles and subcultures of today’s United States.
What are we supposed to do with this flood of raw data? Clearly, little of it will ever be seen by
human eyes. If it will be understood at all, it will have to be analyzed by computers. Although
simple statistical techniques for data analysis were developed long ago, advanced techniques for
intelligent data analysis are not yet mature.
As a result, there is a growing gap between data generation and data understanding. At
the same time, there is a growing realization and expectation that data, intelligently analyzed and
presented, will be a valuable resource to be used for a competitive advantage. The computer
science community is responding to both the scientific and practical challenges presented by the
need to find the knowledge adrift in the flood of data.
In assessing the potential of AI technologies, Michie (1990), a leading European expert
on machine learning, predicted that “the next area that is going to explode is the use of machine
learning tools as a component of large-scale data analysis.” A recent National Science
Foundation workshop on the future of database research ranked data mining among the most
promising research topics for the 1990s.
Some research methods are already well enough developed to have been made part of
commercially available software. Several expert system shells use variations of ID3 for inducing
rules from examples. Other systems use inductive, neural net, or genetic learning approaches to
discover patterns in personal computer databases. Many forward-looking companies are using
these and other tools to analyze their databases for interesting and useful patterns.
American Airlines searches its frequent flyer database to find its better customers,
targeting them for specific marketing promotions. Farm Journal analyzes its subscriber database
and uses advanced printing technology to custom-build hundreds of editions tailored to particular
groups. Several banks, using patterns discovered in loan and credit histories, have derived better
loan approval and bankruptcy prediction methods. General Motors is using a database of
automobile trouble reports to derive diagnostic expert systems for various models. Packaged-
goods manufacturers are searching the supermarket scanner data to measure the effects of their
promotions and to look for shopping patterns.
A combination of business and research interests has produced increasing demands for,
as well as increased activity to provide, tools and techniques for discovery in databases. This
book is the first to bring together leading-edge research from around the world on this topic. It
spans many different approaches to discovery, including inductive learning, Bayesian statistics,
semantic query optimization, knowledge acquisition for expert systems, information theory, and
fuzzy sets.
The book is aimed at those interested or involved in computer science and the
management of data, to both inform and inspire further research and applications. It will be of
particular interest to professionals working with databases and management information systems
and to those applying machine learning to real-world problems.
What Is Knowledge Discovery? There is an immense diversity of current research on
knowledge discovery in databases. To provide a point of reference for this research, we begin
here by defining and explaining relevant terms.
Definition of Knowledge Discovery:
Knowledge discovery is the nontrivial extraction of implicit, previously unknown, and
potentially useful information from data. Given a set of facts (data) F, a language L, and some
measure of certainty C, we define a pattern as a statement S in L that describes relationships
among a subset FS of F with a certainty c, such that S is simpler (in some sense) than the
enumeration of all facts in FS.
A pattern that is interesting (according to a user-imposed interest measure) and certain
enough (again according to the user’s criteria) is called knowledge. The output of a program that
monitors the set of facts in a database and produces patterns in this sense is discovered
knowledge. These definitions about the language, the certainty, and the simplicity and
interestingness measures are intentionally vague to cover a wide variety of approaches.
Collectively, these terms capture our view of the fundamental characteristics of discovery in
databases. In the following paragraphs, we summarize the connotations of these terms and
suggest their relevance to the problem of knowledge discovery in databases.
They noted that one of the earliest discovery processes was encountered by Jonathan
Swift’s Gulliver in his visit to the Academy of Labado. The “Project for improving speculative
Knowledge by practical and mechanical operations” was generating sequences of words by
random permutations and “where they found three or four Words that might make Part of a
Sentence, they dictated them to . . . Scribes.” This process, although promising to produce many
interesting sentences in the (very) long run, is rather inefficient and was recently proved to be
NP-hard.
Key attribute
1. PatientID – Patient’s identification number
Input attributes
1. Sex (value 1: Male; value 0 : Female)
2. Chest Pain Type (value 1: typical type 1 angina, value 2: typical type angina, value 3:
non-angina pain; value 4: asymptomatic)
3. Fasting Blood Sugar (value 1: > 120 mg/dl; value 0: < 120 mg/dl)
4. Restecg – resting electrographic results (value 0: normal; value 1: 1 having ST-T wave
abnormality; value 2: showing probable or definite left ventricular hypertrophy)
5. Exang – exercise induced angina (value 1: yes; value 0: no)
6. Slope – the slope of the peak exercise ST segment (value 1: unsloping; value 2: flat;
value 3: downsloping)
7. CA – number of major vessels colored by floursopy (value 0 – 3)
8. Thal (value 3: normal; value 6: fixed defect; value 7: reversible defect)
9. Trest Blood Pressure (mm Hg on admission to the hospital)
10. Serum Cholesterol (mg/dl)
11. Thalach – maximum heart rate achieved
12. Oldpeak – ST depression induced by exercise relative to rest 13. Age in Year
The models are trained and validated against a test dataset. Lift Chart and Classification
Matrix methods are used to evaluate the effectiveness of the models. All three models are able to
extract patterns in response to the predictable state. The most effective model to predict patients
with heart disease appears to be Naïve Bayes followed by Neural Network and Decision Trees.
Five mining goals are defined based on business intelligence and data exploration. The
goals are evaluated against the trained models. All three models could answer complex queries,
each with its own strength with respect to ease of model interpretation, access to detailed
information and accuracy. Naïve Bayes could answer four out of the five goals; Decision Trees,
three; and Neural Network, two. Although not the most effective model, Decision Trees results
are easier to read and interpret. The drill through feature to access detailed patients’ profiles is
only available in Decision Trees.
Naïve Bayes fared better than Decision Trees as it could identify all the significant
medical predictors. The relationship between attributes produced by Neural Network is more
difficult to understand. IHDPS can be further enhanced and expanded. For example, it can
incorporate other medical attributes besides the 15 listed in Figure 2.1. It can also incorporate
other data mining techniques, e.g., Time Series, Clustering and Association Rules. Continuous
data can also be used instead of just categorical data. Another area is to use Text Mining to mine
the vast amount of unstructured data available in healthcare databases. Another challenge would
be to integrate data mining and text mining [8].
2.6 INTELLIGENT AND EFFECTIVE HEART ATTACK PREDICTION SYSTEM
USING DATA MINING AND ARTIFICIAL NEURAL NETWORK
AUTHORS
B.P. SHANTHAKUMAR
Y.S.KUMARASWAMY
In this paper [6] the authors stated that the diagnosis of diseases is a vital and intricate job
in medicine. The recognition of heart disease from diverse features or signs is a multi-layered
problem that is not free from false assumptions and is frequently accompanied by impulsive
effects. Thus the attempt to exploit knowledge and experience of several specialists and clinical
screening data of patients composed in databases to assist the diagnosis procedure is regarded as
a valuable option. This research work is the extension of our previous research with intelligent
and effective heart attack prediction system using neural network. A proficient methodology for
the extraction of significant patterns from the heart disease warehouses for heart attack
prediction has been presented.
Initially, the data warehouse is pre-processed in order to make it suitable for the mining
process. Once the preprocessing gets over, the heart disease warehouse is clustered with the aid
of the K-means clustering algorithm, which will extract the data appropriate to heart attack from
the warehouse. Consequently the frequent patterns applicable to heart disease are mined with the
aid of the MAFIA algorithm from the data extracted. In addition, the patterns vital to heart attack
prediction are selected on basis of the computed significant weightage. The neural network is
trained with the selected significant patterns for the effective prediction of heart attack. They
have employed the Multi-layer Perceptron Neural Network with Back-propagation as the
training algorithm. The results thus obtained have illustrated that the designed prediction system
is capable of predicting the heart attack effectively.
2.7 PERFORMANCE EVALUATION OF K-MEANS AND HEIRARICHAL
CLUSTERING IN TERMS OF ACCURACY AND RUNNING TIME
AUTHORS
NIDHI SINGH
DIVAKAR SINGH
In this paper [7] the authors stated that as in today’s world large number of modern
techniques is evolving for scientific data collection, large number of data is getting accumulated
at various databases. Systematic data analysis methods are in need to gain/extract useful
information from rapidly growing databanks. Clustering analysis method is one of the main
analytical methods in data mining; in which k-means clustering algorithm is most
popularly/widely used for many applications. Clustering algorithm is divided into two categories:
partition and hierarchical clustering algorithm. This paper discusses one partition clustering
algorithm (kmeans) and one hierarchical clustering algorithm (agglomerative). K-means
algorithm has higher efficiency and scalability and converges fast when dealing with large data
sets. Hierarchical clustering constructs a hierarchy of clusters by either repeatedly merging two
smaller clusters into a larger one or splitting a larger cluster into smaller ones. Using WEKA data
mining tool we have calculated the performance of k-means and hierarchical clustering algorithm
on the basis of accuracy and running time.
As a result of modern methods for scientific data collection, huge quantities of data are
getting accumulated at various databases. Cluster analysis [9] is one of the major data analysis
methods which helps to identify the natural grouping in a set of data items. The K-Means
clustering algorithm is proposed by Mac Queen in 1967 which is a partition-based cluster
analysis method. Clustering is a way that classifies the raw data reasonably and searches the
hidden patterns that may exist in datasets [10]. It is a process of grouping data objects into
disjointed clusters so that the data’s in the same cluster is similar, yet data’s belonging to
different cluster differ. K-means is a numerical, unsupervised, non-deterministic, iterative
method. It is simple and very fast, so in many practical applications, the method is proved to be a
very effective way that can produce good clustering results. The demand for organizing the sharp
increasing data’s and learning valuable information from data, which makes clustering
techniques are widely applied in many application areas such as artificial intelligence, biology,
customer relationship management, data compression, data mining, information retrieval, image
processing, machine learning, marketing, medicine, pattern recognition, psychology, statistics
and so on.
Clustering methods can be divided into two general classes, designated supervised and
unsupervised clustering. In this paper, we focus on unsupervised clustering which may again be
separated into two major categories: partition clustering and hierarchical clustering. There are
many algorithms for partition clustering category, such as kmeans clustering (MacQueen 1967),
k-medoid clustering, genetic k-means algorithm (GKA), Self-Organizing Map (SOM) and also
graph-theoretical methods (CLICK, CAST). Among those methods, K-means clustering is the
most popular one because of simple algorithm and fast execution speed. Hierarchical clustering
methods are among the first methods developed and analyzed for clustering problems. There are
two main approaches. (i) The agglomerative approach, which builds a larger cluster by merging
two smaller clusters in a bottom-up fashion. The clusters so constructed form a binary tree;
individual objects are the leaf nodes and the root node is the cluster that has all data objects. (ii)
The divisive approach, which splits a cluster into two smaller ones in a top-down fashion. All
clusters so constructed also form a binary tree.
The process of k-means algorithm: This part briefly describes the standard k-means
algorithm. K-means is a typical clustering algorithm in data mining and which is widely used for
clustering large set of data’s. It was one of the most simple, non-supervised learning algorithms,
which was applied to solve the problem of the well-known cluster. It is a partitioning clustering
algorithm, this method is to classify the given date objects into k different clusters through the
iterative, converging to a local minimum. So the results of generated clusters are compact and
independent. The algorithm consists of two separate phases. The first phase selects k centers
randomly, where the value k is fixed in advance. The next phase is to take each data object to the
nearest center. Euclidean distance is generally considered to determine the distance between each
data object and the cluster centers. When all the data objects are included in some clusters, the
first step is completed and an early grouping is done. Recalculating the average of the early
formed clusters. This iterative process continues repeatedly until the criterion function becomes
the minimum. Supposing that the target object is x, xi indicates the average of cluster Ci,
criterion function is defined as follows (eq. 1.):
(1)
E is the sum of the squared error of all objects in database.
The distance of criterion function is Euclidean distance, which is used for determining the
nearest distance between each data objects and cluster center. The Euclidean distance between
one vector x=(x1 ,x2 ,…xn) and another vector y=(y1 ,y2 ,…yn ), The Euclidean distance d(xi,
yi) can be obtained as follow:
(2)
The k-means clustering algorithm always converges to local minimum. Before the k-
means algorithm converges, calculations of distance and cluster centers are done while loops are
executed a number of times, where the positive integer t is known as the number of k-means
iterations. The precise value of t varies depending on the initial starting cluster centers. The
distribution of data points has a relationship with the new clustering center, so the computational
time complexity of the k means algorithm is O (nkt). n is the number of all data objects, k is the
number of clusters, t is the iterations of algorithm. Usually requiring k <<n and t <<n[1]. The
reason behind choosing K-means algorithm to study is its popularity for the following reasons:
Its time complexity is O (nkl), n is number of patterns ,k is the number of clusters, l is
the number of iterations taken by the algorithm to converge.
It is order independent, for a given initial seed set of cluster centers, it generates the same
partition of the data irrespective of the order in which the patterns are presented to the algorithm.
Its space complexity is O (n+k).It requires additional space to store the data matrix.
REFERENCES
[1] Franck Le Duff, CristianMunteanb, Marc Cuggiaa and Philippe Mabob, “Predicting Survival
Causes After Out of Hospital Cardiac Arrest using Data Mining Method”, Studies in Health
Technology and Informatics, Vol. 107, No. 2, pp. 1256-1259, 2004.
[2] W.J. Frawley and G. Piatetsky-Shapiro, “Knowledge Discovery in Databases: An Overview”,
AI Magazine, Vol. 13, No. 3, pp. 57-70, 1996.
[3] Kiyong Noh, HeonGyu Lee, Ho-Sun Shon, Bum Ju Lee and Keun Ho Ryu, “Associative
Classification Approach for Diagnosing Cardiovascular Disease”, Intelligent Computing in
Signal Processing and Pattern Recognition, Vol. 345, pp. 721-727, 2006.
[4] Latha Parthiban and R. Subramanian, “Intelligent Heart Disease Prediction System using
CANFIS and Genetic Algorithm”, International Journal of Biological, Biomedical and Medical
Sciences, Vol. 3, No. 3, pp. 1-8, 2008.
[5] Sellappan Palaniappan and Rafiah Awang, “Intelligent Heart Disease Prediction System
using Data Mining Techniques”, International Journal of Computer Science and Network
Security, Vol. 8, No. 8, pp. 1-6, 2008
[6] Shantakumar B. Patil and Y.S. Kumaraswamy, “Intelligent and Effective Heart Attack
Prediction System using Data Mining and Artificial Neural Network”, European Journal of
Scientific Research, Vol. 31, No. 4, pp. 642-656, 2009.
[7] Nidhi Singh and Divakar Singh, “Performance Evaluation of K-Means and Hierarchal
Clustering in Terms of Accuracy and Running Time”, Ph.D Dissertation, Department of
Computer Science and Engineering, Barkatullah University Institute of Technology, 2012.
[8] Weiguo, F., Wallace, L., Rich, S., Zhongju, Z.: “Tapping the Power of Text Mining”,
Communication of the ACM. 49(9), 77-82, 2006.
[9] Jiawei Han M. K, Data Mining Concepts and Techniques, Morgan Kaufmann Publishers, An
Imprint of Elsevier, 2006
[10] Huang Z, “Extensions to the k-means algorithm for clustering large data sets with
categorical values,” Data Mining and Knowledge Discovery, Vol.2, pp:283–304, 1998
[11] A comparison of antiarrhythmic-drug therapy with implant-able defibrillators in patients
resuscitated from near-fatalventricular arrhythmias. The Antiarrhythmics versusImplantable
Defibrillators (AVID) Investigators. N Engl JMed. 1997 Nov 27;337(22):1576-83.
[12] R.Wu, W.Peters, M.W.Morgan, “The Next Generation Clinical Decision Support: Linking
Evidence to Best Practice”, Journal of Healthcare Information Management. 16(4), pp. 50-55,
2002.
[13] Mary K.Obenshain, “Application of Data Mining Techniques to Healthcare Data”, Infection
Control and Hospital Epidemiology, vol. 25, no.8, pp. 690–695, Aug. 2004.
[14] G.Camps-Valls, L.Gomez-Chova, J.Calpe-Maravilla, J.D.MartinGuerrero, E.Soria-Olivas,
L.Alonso-Chorda, J.Moreno, “Robust support vector method for hyperspectral data classification
and knowledge discovery.” Trans. Geosci. Rem. Sens. vol.42, no.7, pp.1530–1542, July.2004.
[15] Obenshain, M.K: “Application of Data Mining Techniques to Healthcare Data”, Infection
Control and Hospital Epidemiology, 25(8), 690–695, 2004.
[16] Wu, R., Peters, W., Morgan, M.W.: “The Next Generation Clinical Decision Support:
Linking Evidence to Best Practice”, Journal Healthcare Information Management. 16(4), 50-55,
2002.
[17] Charly, K.: “Data Mining for the Enterprise”, 31st Annual Hawaii Int. Conf. on System
Sciences, IEEE Computer, 7, 295-304, 1998.
[18] Blake, C.L., Mertz, C.J.: “UCI Machine Learning Databases”,
http://mlearn.ics.uci.edu/databases/heart-disease/, 2004.
[19] Mohd, H., Mohamed, S. H. S.: “Acceptance Model of Electronic Medical Record”, Journal
of Advancing Information and Management Studies. 2(1), 75-92, 2005.