You are on page 1of 5

N.DEEPIKA* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 11, Issue No.

2, 253 - 257

Association rule for classification of Heart-attack patients
N.DEEPIKA, M.Tech, Department of CSE ATRI ,Parvathapur, Uppal, Hyderabad, India. e-mail: nagilladeepika@gmail.com K.CHANDRA SHEKAR, Sr. Asst.Prof. Department of CSE, ATRI, Parvathapur, Uppal, Hyderabad, India. e-mail :chandhra2k7@gmail.com D. SUJATHA, Assoc.Prof and HOD Department of CSE ATRI, Parvathapur, Uppal, Hyderabad, India. e-mail : sujatha.dandu@gmail.com

Keywords- Association rule, Binning, Data Mining, Frequent Patterns, Heart Disease, PCAR (Pruning Classification of association rule), Pre-processing.

I. INTRODUCTION The healthcare industry collects huge amounts of healthcare data which, unfortunately, are not “mined” to discover hidden information for effective decision making. Discovery of hidden patterns and relationships often goes unexploited. Advanced data mining techniques can help remedy this situation. Using medical profiles such as age, sex, blood pressure and blood sugar it can predict the likelihood of patients getting a heart disease. It enables significant knowledge, e.g. patterns, relationships between medical factors related to heart disease, to be established. information about data” [1]. Data mining technology provides a useroriented approach to novel and hidden patterns in the data.

IJ A
ISSN: 2230-7818

Abstract— Data mining is the non trivial extraction of implicit, previously unknown and potentially useful information from data. Data mining technology provides a user- oriented approach to novel and hidden patterns in the data. This paper presents about the various effective heart attack prediction system using Pruning-Classification Association Rule (PCAR): an Efficient Approach for mining Association Rules. A proficient methodology for the generation of association rules from the heart disease warehouses for heart attack prediction has been presented. Initially, the Pima Indian heart attack data warehouse is pre-processed in order to make it suitable for mining process. Once preprocessing gets over, the heart disease data warehouse is binning with the aid of the modified equal width binning interval approach to discretizing continuous valued attributes. The approximate width of the desired interval is chosen based on the opinion of medical expert and is provided as an input parameter to the model. First we have converted numeric attributes into categorical form based on above techniques. Consequently the frequent patterns applicable to heart disease are mined with the aid of the PCAR algorithm from the data extracted. In addition, the patterns vital to heart attack prediction are selected on basis of the computed significant class labels. The classification technique is trained with the selected class labels for the effective prediction of heart attack. Lastly we have generated the association rules which are useful to identify general associations in the data. The results thus obtained have illustrated that the designed prediction system is capable of predicting the heart attack effectively.

The discovered knowledge can be used by the healthcare administrators to improve the quality of service. A wide variety of areas including marketing, customer relationship management, engineering, medicine, crime analysis, expert prediction, Web mining, and mobile computing, besides others utilize Data mining [2]. Numerous fields associated with medical services like prediction of effectiveness of surgical procedures, medical tests, medication, and the discovery of relationships among clinical and diagnosis data as well employ Data Mining methodologies [3].Therefore, data mining has developed into a vital domain in healthcare [4]. It is possible to predict the efficiency of medical treatments by building the data mining applications. The real-life data mining applications are attractive since they provide data miners with varied set of problems, time and again. Working on heart disease patients databases is one kind of a real-life application. Therefore it appears reasonable to try utilizing the knowledge and experience of several specialists collected in databases towards assisting the diagnosis process [2], [5]. In the recent past, the data mining techniques were utilized by several authors to present diagnosis approaches for diverse types of heart diseases [6, 7, 8, 9, 10, 11]. This paper presents about the various effective heart attack prediction system using PCAR: an Efficient Approach for mining Association Rules. A proficient methodology for the generation of association rules from the heart disease warehouses for heart attack prediction has been presented. Initially, the Pima Indian heart attack data warehouse is pre-processed in order to make it suitable for mining process. Once preprocessing gets over, the heart disease data warehouse is binning with the aid of the modified equal width binning interval approach to discretizing continuous valued attributes. The approximate width of the desired interval is chosen based on the opinion of medical expert and is provided as an input parameter to the model. First we have converted numeric attributes into categorical form based on above techniques. Consequently the frequent patterns applicable to heart disease are mined with the aid of the PCAR algorithm from the data extracted. In addition, the patterns vital to heart attack prediction are selected on basis of the computed significant class labels. The classification technique is trained with the selected class labels for the effective prediction of heart attack. A lot of existing algorithms used for mining association rules identify frequent item sets by the method of bottom-up combination of smaller frequent item sets or top-down decomposing of larger

@ 2011 http://www.ijaest.iserp.org. All rights Reserved.

ES

T

Page 253

Cardiomyopathy and Cardiovascular disease are some categories of heart diseases. principle of the algorithm is: firstly calculates the support of all item sets in candidate item set Ck obtained by Lk-1. The experimental results are described in Section 5. The extraction of significant patterns from heart disease data warehouse is pre-processed in order to make it suitable for mining process. It is the best-known algorithm to mine association rules. Apriori uses a bottom up approach. First one necessitates the attributes to appear on only one side of the rule. A sudden blockage of a coronary artery. HEART DISEASE The term Heart disease encompasses the diverse diseases that affect the heart. then combines all frequent k-item sets to a new candidate item set Ck+1. Cardiovascular disease (CVD) results in severe illness. Data mining methods may aid the clinicians in the predication of the survival of patients and in the adaptation of the practices consequently. Two groups of rules envisaged the presence or absence of heart disease in four specific heart arteries. finally discovers frequent item sets. ES III. operation time and memory requirement could be decreased accordingly. 11. until finds large frequent item sets. then classifies item sets based on frequency of item sets. this paper presents a new efficient method. Coronary heart disease. the candidate k-item set is frequent k-item set.257 infrequent item sets. Myocardial infarctions. and death. Generating item sets that pass a minimum support threshold.1. 4.ijaest. The conclusions are summed up in Section 6. Apriori algorithm gets large frequent item sets through the combination and pruning of small frequent item sets. The @ 2011 http://www. The algorithm terminates when no further successful extensions are found. An introduction about the heart disease and its effects are given in Section 3. Issue No. Apriori uses breadth-first search and a hash tree structure to count candidate item sets efficiently.org. Chest pains arise when the blood received by the heart muscles is inadequate. generally known as a heart attacks. PCAR combines minimum frequency items with minimum frequency item sets.DEEPIKA* et al. The term “cardiovascular disease” includes a wide range of conditions that affect the heart and the blood vessels and the manner in which blood is pumped and circulated through the body. It comes from the analyzing and considering of Apriori algorithm. 2. Experiments illustrated that the constraints reduced the number of discovered rules remarkably besides decreasing the running time. It uses a breadth-first search strategy to counting the support of item sets and uses a candidate generation function which exploits the downward closure property of support. REVIEW OF RELATED BACKGROUND LITERATURE Numerous works in literature related with heart disease diagnosis using data mining techniques have motivated our work. Some of the works are discussed below: IJ A Steps to perform Apriori algorithm: ISSN: 2230-7818 This paper presents a novel and more efficient PCAR algorithm. where frequent subsets are extended one item at a time (a step known as candidate generation. and groups of candidates are tested against the data. 3. The efficiency of the designed system in predicting the heart attack is illustrated by the acquired results. The problem of identifying constrained association rules for heart disease prediction was studied by Carlos Ordonez [12]. these methods result the large volumes of candidate item sets. 2. High blood pressure. 253 . It firstly deletes infrequent items from item sets. namely Pruning-Classification Association Rule (PCAR). The second one segregates attributes into uninteresting groups. therefore. It is proved by experiments that PCAR outperforms the well-known Apriori algorithm. and angina pectoris. Consequently the frequent patterns applicable to heart disease are mined with the aid of the PCAR algorithm from the data extracted in Section 4. as shown in Fig. coronary T Fig-1: Procedure of the Apriori algorithm Page 254 . level by level. II.iserp. The ultimate constraint restricts the number of attributes in a rule. The number of candidate item sets is greatly reduced and item sets need not to be combined or decomposed. Actually as the supersets of infrequent items are infrequent item sets. Heart disease kills one person every 34 seconds in the United States. if the support of the item set is greater than or equal to the minimum support. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No.N.. generally due to a blood clot results in a heart attack. The remaining sections of the paper are organized as follows: In Section 2. This method has significant advantage in mining association rule at large volumes of items and small frequency of item sets. Narrowing of the coronary arteries results in the reduction of blood and oxygen supply to the heart and leads to the Coronary heart disease (CHD). 1. a brief review of some of the works on heart disease diagnosis is presented. or chest pain are encompassed in the CHD. Generating rules that pass a minimum confidence threshold. disability. that is Lk.Three constraints were introduced to decrease the number of patterns. All rights Reserved.

In real world. Characteristics of the patients like number of times of chest pain and age in years were recorded. The actions comprised in the pre-processing of a data set are the removal of duplicate ISSN: 2230-7818 Ecr-Normal Ecr-Abnormal EcrProbable/definite MHR-Normal MHR-High MHR-Severe @ 2011 http://www. A.ijaest. the data warehouse is preprocessed to make the mining process more efficient.iserp. The following summarizes the cut-off values along with the names of the bins for the variables. IV. We have used approximate equal interval binning and also taken advice from medical experts. In the First stage of our proposed study. The heart disease data warehouse contains the screening clinical data of heart patients. These chosen frequent patterns can be used in the design and development of heart attack prediction system. serum cholestral in (mg/dl). / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. Initially. Approximate equal binning techniques based on expert advice: After the preprocessing the below attributes are included. stroke. Attributes Age T Description Age of the Age<40 Age<60 patient(years) Age >60 1=male 0=female Value1:typical angina Value2:atypical angina Gender Chest pain type pain Resting blood BP<80 BP<90 BP>90 pressure in (mmHg) Serum cholestral in mg/dl Chol<200 Chol<400 Chol>400 blood 1=true 0=false Val=0 Val=1 Val=2 (value) <100 Fasting Resting electrocardiographic results sugar>120mg/dl Maximum heart rate achieved Cut-off values Type Young Middle Old age ES Sex CP Trest bps Chol Fbs Rest ecg Thalach Value3:non-anginal Value4:asymptomatic BP-Normal BP-Normal-toHigh BP-High Chol-Normal Chol-High Chol-Severe IJ A B.257 artery disease. All rights Reserved. 2. Page 255 .N. The data variables are binned in to small number of categories. 11. We also consider important measure confidence.DEEPIKA* et al. Lastly we applied PCAR algorithm to generate the rules. every year due to the cardiovascular diseases. blood pressure (mm Hg). C. Issue No. PROPOSED METHOD The extraction of significant patterns from the heart disease data warehouse is presented in this section. we used preprocessing in order to handle missing values. and electrocardiographic result. Data Preprocessing Cleaning and filtering of the data might be necessarily carried out with respect to the data and data mining algorithm employed so as to avoid the creation of deceptive or inappropriate rules or patterns[14]. records. Data Set The. 253 . it is always true. The significant items is calculated for all frequent patterns with the aid of the approach proposed. To remove the number of inconsistencies which are associated with data we use Data preprocessing. Pima Indian Heart attack dataset used was obtained from UCI machine learning repository [13]. The World Health Organization has estimated that 12 million deaths occurs worldwide. Some other important parameters need to be checked for every 2 hours thalach (maximum heart rate achieved). The frequent patterns with confidence greater than a predefined threshold are chosen. accounting for missing data points and removing unneeded data fields.org. data is not always complete and in the case of the medical data. or rheumatic fever/rheumatic heart disease are the various forms of cardiovascular disease. normalizing the values used to represent information in the database. Later we applied equal interval binning with approximate values based on medical expert advice to pima Indian heart attack data. Classification of general associations requires categorical data. Moreover it might be essential to combine the data so as to reduce the number of data sets besides minimizing the memory and processing resources required by the data mining algorithm [15].

BP-H) (7.MHR-s)]==> E. finally discovers frequent item sets. It has freedom to produce a combination of attributes.. In the health care system it can be applied as follows: (Symptoms)(Previous--disease) PCAR Algorithm PCAR algorithm identifies frequent item sets by pruning infrequent items...5 num Angiographic disease status T (value) <50% T (value) >50% DE-Normal DE-Normal-toHigh DE-High Volve diameter is low Volve diameter is high induced by exercise relative to rest 0 or 1 Exang Exercise angina induced generated from even the data set is small..ASMP)(7.. Different association rules convey different regularities that trigger in the dataset and generally predict the different things and so many association rule VI.. We keep such rules which are applicable reasonably large number of instances based on coverage and accuracy criteria.. 80% confidence 88% 72% 63% [(2..NAP)(2.NAP)(8.N. An accurate prediction of the value of a goal attribute will improve decision-making process.. PCAR combines minimum frequency items with minimum frequency item sets. 253 . We focused on using different algorithms for predicting combinations of several target attributes....High) and (Maximum Heart Rate Achieved >125 . EXPERIMENTAL RESULTS IF conditions THEN conclusion This kind of rule consists of two parts. The coverage of an association rule is the number of instances for which it predicts correctly this is often called its support.88% [(2.2 IJ A Fig-2: Procedure of the PCAR algorithm ES S. 92% [(2.DNL)].NAP) (2. V.MHR-S)]==> [(10. DN-H)] ..ASMP) (8. operation time and memory requirement could be decreased accordingly..AA) ]==> [(10... MHR-S)]==> [(10.DN-L)].. expressed as proportion of all instances to which it applies.AA)]==> [(10.org. then classifies item sets based on frequency of item sets. The number of candidate item sets is greatly reduced and item sets need not to be combined or decomposed.Y)(8. All rights Reserved.. Y) (3. Its accuracy often called confidence is the number of instances that it predicts correctly. This method has significant advantage in mining association rule at large volumes of items and small frequency of item sets.DEEPIKA* et al.5 (value) <2. Issue No. Applying Association Rules Association rules are nothing different from classification rules except that does not predict only class labels but also predict any other attribute. 2.72% [(10.. Page 256 . / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. The rule antecedent (the IF part) contains one or more conditions about value of predictor attributes where as the rule consequent (THEN part) contains a prediction about the value of a goal attribute..0) (0.257 Old peak St depression St depression induced by exercise relative to rest 0 or 1 (value) <1. In this paper.O) (2. Applying Pruning-Classification Association Rule on Heart Attack Patient Dataset Pruning-Classification Association Rule (PCAR). It firstly deletes infrequent items from item sets.0)(2.iserp..63% history) ===> (Cause—of--- Example 1: If then rule induced in the diagnosis of level of disease status in blood IF (Age < 40 -Young) and (Blood Pressure > 90 . The user has to specify the minimum coverage and accuracy values and look for only those rules whose values are at least of the specified minimum value.. therefore. we have presented Firstly....DN-H)]. IF-THEN prediction rules are very popular in data mining. they represent discovered knowledge at a high level of abstraction. DN-L)] .0)(7. Its procedure is shown as Fig.. This ISSN: 2230-7818 @ 2011 http://www. T Rules [(0.NO 1 2 3 D..ijaest..Severe) THEN Diagnosis =Volve Diameter narrowing is High >=50% [(0. 11.5 (value) >2. CONCLUSION We studied the problem of constraining and summarizing different algorithms of data mining.

May 2007. June 2005. 1 (January . 102-108. For predicting heart attack significantly 30 attributes are listed. from clinical databases: an intelligent diagnostic process in healthcare”. Dimitrova and A. Knowledge Discovery in Databases: An Overview. The AAAI/MIT Press. Continuous data can also be used instead of just categorical data. Hamburg. Chapter 1. Navin Rajpal. 2004.DEEPIKA* et al. Data Mining. of Leistungs-. "Decision Support System for Heart Disease Diagnosis Using Neural Network". 11. Vol. Bing Liu. Gegov. Frawley and Piatetsky-Shapiro. Studies in health technology and informatics. intelligent engineering systems through artificial neural networks.ijaest.8 No. Cristian Munteanb.Hettich. 3-34. "Intelligent Heart Disease Prediction System Using Data Mining Techniques"." Seminar Presentation at University of Tokyo. and Text Mining In Medical Informatics". Mong-Li Lee. and C. “Mining Biosignal Data: Coronary Artery Disease Diagnosis using Linear and Nonlinear Features of HRV.Wynne Hsu. 16. "Improving Heart Disease Prediction Using Constrained Association Rules. J. Merz. 2. Journal of Computer Science and Network Security.8.org. and Georg Carle. 2001. 2005. Stud 5. No. SAER’06. pp. 9. Ki Yong Noh. "Knowledge Management. Medical Informatics: Knowledge Management And Data Mining In Biomedicine. In our future work. Issue No.iserp. 15 . Keun Ho Ryu. Springer.Andreeva P. N Maglaveras. and William Hersh. 2004. GI/ITG-Workshop MMBnet 2007. Alexander Ross. Rafiah Awang. pp. Tzung-I Tang. “Traffic anomaly detection using k-means clustering”. 11.Newman. Department of Information and Computer Science. "Improving Heart Disease Prediction Using Constrained Association Rules. Marc Cuggiaa. Niti Guru.1998. Mark Embrechts. Vol. 253 . 10.N. Fuller. 2006. ANNIE 06. Boleslaw Szymanski. "A Comparative Study of Medical Data Classification Methods Based on Decision Tree and System Reconstruction Analysis". "Using Efficient Supanova Kernel For Heart Disease Diagnosis". last accessed: 1/10/2009. this can further enhanced and expanded. Sellappan Palaniappan.Gerhard Münz. Heon Gyu Lee. Anil Dahiya. The experimental Results have illustrated the efficacy of the designed prediction system in predicting the heart attack. A proficient methodology for the generation of association rules from the heart disease warehouses for heart attack prediction has been presented. 6. No. 2. Carlos Ordonez.”Irvine. 2006. Guangfu Shu.L. We can also use Text Mining to mine the vast amount of unstructured data available in healthcare databases.und Verlässlichkeitsbewertung September 2007. 4. vol. 1. CA: University of California. proc. 12. 4. 3. Gang Zheng. Sa Li. KDD 2000: pp: 430-436. Menlo Park. New York.. D. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. pp. 14. REFERENCES 1. eds. Vol. “UCI Repository of machine learning databases. 2000 von Kommunikationsnetzen und Verteilten Systemen. Vol. Karsten Sternickel. M. IJ A ISSN: 2230-7818 @ 2011 http://www. Pengtao Wang. "Predicting Survival Causes After Out of Hospital Cardiac Arrest using Data Mining Method".S. IJCSNS International Data Mining. Carol Friedman.257 paper presents about the various effective heart attack prediction system using PCAR: an Efficient Approach for mining Association Rules.” LNAI 4819: Emerging Technologies in Knowledge Discovery and 7. Franck Le Duff. In Proc. Delhi Business Review. Philippe Mabob. Yalou Huang.June 2007. Tok Wang Ling. “Mining association rules Health Technol Inform 84: Pt 2. 13.J. IEMS. Carlos Ordonez. 1399-1403. Sherrilynne S. ES Page 257 T . C Pappas. 56-66. 8. S. All rights Reserved. 1256-9. S Stilou. 107. “Information Representation in Cardiological Knowledge Based System”."Seminar Presentation at University of Tokyo. Lijuan Zhu. Hsinchun Chen. C. 1996. 8. No. 2004. C. pp. Long Han. Germany. “Exploration mining in diabetic patients databases: findings and conclusions”. Zuverlässigkeits. 4. August 2008. Pt 2. pp:305-310. P D Bamidis.A. Blake. pp: 23-25 Sept.