Intelligent Heart Disease Prediction System Using Naïve Bayes

A major challenge facing healthcare organizations (hospitals, medical centers) is the provision of quality services at affordable costs. Quality service implies diagnosing patients correctly and administering treatments that are effective. Poor clinical decisions can lead to disastrous consequences which are therefore unacceptable. Hospitals must also minimize the cost of clinical tests. They can achieve these results by employing appropriate computer-based information and/or decision support systems. Most hospitals today employ some sort of hospital information systems to manage their healthcare or patient data. These systems are designed to support patient billing, inventory management and generation of simple statistics. Some hospitals use decision support systems, but they are largely limited. Clinical decisions are often made based on doctors’ intuition and experience rather than on the knowledge rich data hidden in the database. This practice leads to unwanted biases, errors and excessive medical costs which affects the quality of service provided to patients. The main objective of this research is to develop a Intelligent Heart Disease Prediction System using three data mining modeling technique, namely, Naïve Bayes. It is implemented as web based questionnaire application .Based on the user answers, it can discover and extract hidden knowledge (patterns and relationships) associated with heart disease from a historical heart disease database. It can answer complex queries for diagnosing heart disease and thus assist healthcare practitioners to make intelligent clinical decisions which traditional decision support systems cannot. By providing effective treatments, it also helps to reduce treatment costs.

HARDWARE CONFIGURATION • • • • • • Intel Pentium IV 256/512 MB RAM 1 GB Free disk space or greater 1 GB on Boot Drive 17” XVGA display monitor 1 Network Interface Card (NIC

SOFTWARE CONFIGURATION • • • • • • • MS Windows XP/2000 MS IE Browser 6.0/later MS DotNet Framework 2.0 MS Visual Studio.Net 2005 Internet Information Server (IIS) MS SQL Server 2000 Windows Installer 3.1


C#.Net C# is an object-oriented programming language developed by Microsoft as part of the .Net initiative. C# is intended to be a simple, modern, general-purpose, object-oriented programming language. Because software robustness, durability and programmer productivity are important, the language should include strong type checking, array bounds checking, detection of attempts to use uninitialized variables, source code portability, and automatic garbage collection. C# is intended to be suitable for writing applications for both hosted and embedded systems, ranging from the very large that use sophisticated operating systems, down to

the very small having dedicated functions. C# applications are intended to be economical with regards to memory and processing power requirements. Programmer portability is the very important feture of C#. C# compiler could generate machine code like traditional compilers of C++ or FORTRAN; in practice, all existing C# implementations target Common Language Infrastructure (CLI). C# is more type safe than C++. The only implicit conversions by default are those which are considered safe, such as widening of integers and conversion from a derived type to a base type. C# is the programming language that most directly reflects the underlying Common Language Infrastructure (CLI). Most of C# intrinsic types correspond to value-types implemented by the CLI framework. C# supports a strict boolean type, bool Statements that take conditions, such as while and if, require an expression of a boolean type.

FEATURES OF C#.NET  The Visual Studio.Net is a tool rich programming environment containing all the functionality for handling C# projects.  The .Net integrated development environment provides enormous advantages for the programmers.  C# is directly related to C, C++ and Java. C # is a case sensitive language and it is designed to produce portable code.  C# includes features that directly support the constituents of components such as properties, methods and events.  C# is an object oriented language which supports all object oriented programming (OOP’s) concepts such as encapsulation, polymorphism and inheritance.  Encapsulation is a programming mechanism that binds code and the data together. It manipulates and keeps both safe from outside interference and misuse.  Polymorphism is the quality that allows one interface to access a general class of action.

Full 32-bit Operating System Minimizes the chance of application failures and unplanned reboots. Windows File Protection will replace that file with the correct version. Windows XP mitigates many common system failures found in earlier versions of windows. All methods and members must be declared within classes. .  Multiple inheritance is not supported. although a class can implement any number of interfaces.  There are no global variables or functions. Driver Certification Provides safeguards and assure that device drivers have not been tampered with and reducing the risk of installing non-certified drivers.  Intellisense displays the name of every members of a class. performance. Inheritance is the process by which one object can acquire the properties of another object. cross language integration and language independent are important features of C#. In the event a file is overwritten.  Interoperability. simplicity.  C# allows us for creating both managed and unmanaged applications. By safeguarding system files in this manner. FEATURES OF WINDOWS XP The major feature of the Windows XPprofessional  Reliable  Easy to use and Maintain  Internet Ready Windows File protection Protects core system files from being overwritten by application installs.

helping users install. systems. track.Microsoft Installer Works with the Windows Installer service. and applications. minimizing the risks of user error and possible loss of productivity. and remove software programs correctly. resulting in simplifier. read. upgrade. . faster. Scalable Memory and Processor Support Supports up to 4 gigabytes (GB) of RAM and up to two symmetric multiprocessors. allowing you to run more programs and perform more tasks at the same time than Windows 95 or Windows 98. Faster Multitasking Uses a full 32-bit architecture. Remote OS Installation Permits Windows XP Professional to be installed across the network (including SysPrep images). and edit documents in hundreds of languages. more cost-effective deployment. System Preparation Tool (SysPrep) Help administrator clone computer configurations. Multilingual Support Allows users to easily create. configure. Faster Performance Provides 25 percent faster performance than Widows 95 and Windows 98 on system with 64 megabytes (MB) or more of memory. Remote OS Installation saves time and reduces deployment.

Peer-to-Peer Support for windows 95/98 and Windows NT Enables windows 2000 professional to interoperate with earlier versions of Windows on a peer-to-peer level.0 Includes web and FTP server support. folders or Web sites. printers and peripherals. Active server Pages. And choose which search engine you want to use – all from one location. The History bar not only tracks Web sites. Strong Development Platform Support for Dynamic HTML Behaviors and XML gives developers the broadest range of options – with the fastest development time. IIS 5. Favorites Helps user to find and organize relevant information whether it’s stored in files. and database connections. Available as an optional component. allowing the sharing of all resources. Internet Information Services (IIS) 5. such as Web pages or people addresses. but also intranet sites. as well as support for Front Page transactions. network servers and local folders. Search Bar Helps user to quickly search for different types of information.0 is installed automatically for those upgrading from versions of Windows with Personal Web Server installed. such as folders. History Bar Helps user to find the way back to sites viewed in the past. SQL SERVER 2005 .

SQL Server 2000 introduces Net-Library support for Virtual Interface Architecture (VIA) system-area networks that provide high-speed connectivity between servers. and delete values in the database. such as between application servers and database servers. This group. Federated Database Servers SQL Server 2000 supports enhancements to distributed partitioned views that allow you to partition tables horizontally across multiple servers. wherever we are using its built-in features. update. we can do the following from Microsoft Internet Explorer or our favorite web browser.  . This allows you to scale out one database server to a group of database servers that cooperate to provide the same performance levels as a cluster of database servers. Existing system  Clinical decisions are often made based on doctors’ intuition and experience rather than on the knowledge rich data hidden in the database.The SQL server web data administrator enables us to easily manage our SQL Server data. XML can also be used to insert. errors and excessive medical costs which affects the quality of service provided to patients. This practice leads to unwanted biases. or federation. Additionally. Microsoft SQL Server 2000 introduces several server improvements and new features: XML Support The relational database engine can return data as Extensible Markup Language (XML) documents. of database servers can support the data storage requirements of the largest Web sites and enterprise data processing systems.

or hospital staff. Patient safety is sometimes negligently given the back seat for other concerns.   Proposed Systems  This practice leads to unwanted biases. Medical Misdiagnoses are a serious risk to our healthcare profession. There are many ways that a medical misdiagnosis can present itself. and operations.    . and improve patient outcome. a misdiagnosis of a serious illness can have very extreme and harmful effects. This suggestion is promising as data modeling and analysis tools. errors and excessive medical costs which affects the quality of service provided to patients. enhance patient safety. decrease unwanted practice variation. namely. such as the cost of medical tests. e.. The National Patient Safety Foundation cites that 42% of medical patients feel they have had experienced a medical error or missed diagnosis. Whether a doctor is at fault. data mining. The main objective of this research is to develop a prototype Intelligent Heart Disease Prediction System (IHDPS) using three data mining modeling techniques. drugs. If they continue.g. Thus we proposed that integration of clinical decision support with computerbased patient records could reduce medical errors. then people will fear going to the hospital for treatment. have the potential to generate a knowledge-rich environment which can help to significantly improve the quality of clinical decisions. Naïve Bayes and Neural Network. We can put an end to medical misdiagnosis by informing the public and filing claims and suits against the medical practitioners at fault. Decision Trees.

the values will normally all be of the same kind. but may also be nominal data (i. and duplicate data have all been resolved. Each row corresponds to a given member of the data set in question. inconsistent data. not consisting of numerical values). Each value is known as a datum. Here in our project we get a data set from . It is assumed that problems such as missing data. The attribute “PatientID” was used as the key. For each variable. such as real numbers or integers. The values may be numbers. such as height and weight of an object or values of random numbers. The records were split equally into two datasets: training dataset (455 records) and testing dataset (454 records). the records for each set were selected randomly. Each column represents a particular variable. To avoid bias. . It lists values for each of the variables. for example representing a person's ethnicity..e. for example representing a person's height in centimeters. values may be of any of the kinds described as a level of measurement. the rest are input attributes. which need to be indicated in some way. there may also be "missing values". corresponding to the number of rows. The attribute “Diagnosis” was identified as the predictable attribute with value “1” for patients with heart disease and value “0” for patients with no heart disease.dat file as our file reader program will get the data from them for the input of Naïve Bayes based mining process. Modules: Analyzing the Data set: A data set (or dataset) is a collection of data. The data set may comprise data for one or more members. More generally. it also helps to reduce treatment costs. To enhance visualization and ease of interpretation. So its providing effective treatments. usually presented in tabular form. A total of 500 records with 15 medical attributes (factors) were obtained from the Heart Disease database lists the attributes. However.

If B represents the dependent event and A represents the prior event. Bayes' theorem can be stated as follows. predict the class of the following new example using Naïve Bayes classification: .Naives Baye’s Implementation in Mining: I recommend using Probability For Data Mining for a more in-depth introduction to Density estimation and general use of Bayes Classifiers. with Naive Bayes Classifiers as a special case. predict the class of the following new example using Naïve Bayes classification: with some numerical attributes). the algorithm counts the number of cases where A and B occur together and divides it by the number of cases where A occurs alone. Bayes' Theorem: Prob(B given A) = Prob(A and B)/Prob(A) To calculate the probability of B given A. Bayes' Theorem finds the probability of an event occurring given the probability of another event that has already occurred. Applying Naïve Bayes to data with numerical attributes and using the Laplace correction (to be done at your own time. not in class)( data with some numerical attributes). But if you just want the executive summary bottom line on learning and using Naive Bayes classifiers on categorical attributes then these are the slides for you.

value 2: flat. and often have standardized answers that make it simple to compile data. Fasting Blood Sugar (value 1: > 120 mg/dl. value 3: non-angina pain. value 4: asymptomatic) 3. Exang – exercise induced angina (value 1: yes. CA – number of major vessels colored by floursopy (value 0 – 3) 8. value 3: downsloping) 7. value 1: 1 having ST-T wave abnormality. Here our questionnaire is based on the attribute given in the data set. value 7:reversible defect) 9. such standardized answers may frustrate users. Sex (value 1: Male. do not require as much effort from the questioner as verbal or telephone surveys. Questionnaires are also sharply limited by the fact that respondents must be able to read the questions and respond to them. Trest Blood Pressure (mm Hg on admission to the hospital) . Thal (value 3: normal. Chest Pain Type (value 1: typical type 1 angina. However. Restecg – resting electrographic results (value 0: normal. so the our questionnaire contains : Input attributes 1. value 0:< 120 mg/dl) 4. value 2: typical type angina. value 0: no) 6. Slope – the slope of the peak exercise ST segment (value 1: unsloping. value 2: showing probable or definite left ventricular hypertrophy) 5.Designing the Questionnaire: Questionnaires have advantages over some other types of medical symptoms that they are cheap. value 6: fixed defect. value 0 : Female) 2.

The use of asynchronous requests allows the client's Web browser UI to be more interactive and to respond quickly to inputs. a web application can request only the content that needs to be updated. Oldpeak – ST depression induced by exercise relative to rest 13. design and implementation of a database and web application.Net Web Application. Heart Disease In WEB In our Heart disease development the modeling and the standardized notations allow to express complex ideas in a precise way. since scripts and style sheets only have to be requested once. even if the application has not changed on the server side. Users may perceive the application to be faster or more responsive.10. It plans to divide the system in three different layers that are in charge of interface control logic and data access. MVC architecture has had wide acceptance for corporation software development. With ASP. In many cases. With the purpose of illustrating a successful application built under MVC. Serum Cholesterol (mg/dl) 11. Thalach – maximum heart rate achieved 12. the pages on a website consist of much content that is common between them. . ASP.Net Web Application can reduce connections to the server. in this work we introduce different phases of analysis. that content would have to be reloaded on every request. The use of ASP. web applications can retrieve data from the server asynchronously in the background without interfering with the display and behavior of the existing page. Height in cms 15. However. Using traditional methods.Net Web Application is a group of interrelated web development techniques used for creating interactive web applications or rich Internet applications. facilitating the communication among the project participants that generally have different technical and cultural knowledge. thus drastically reducing bandwidth usage and load time.Net Web Application. using ASP. this facilitates the maintenance and evolution of systems according to the independence of the present classes in each layer. Weight in Kgs. and sections of pages can also be reloaded individually. Age in Year 14.


Models for Data Mining Business understanding Data understanding Data preparation Modeling Evaluation Deployment .

Data Flow Diagram .

Heart_disease_297 Field Name Data Type Int(4) ID (PK) Numeric(9) age Char(50) sex Numeric(9) Chest_pain_type Numeric(9) Trest_blood_pressure Numeric(9) Serum_cholestorral Numeric(9) Fasting_blood_sugar Resting_electrocardiographicNumeric(9) Numeric(9) Maximum_heart_rate Numeric(9) Exerice_induced_angina Numeric(9) St_depression_induced Numeric(9) Slope_of_the_peak Number_of major_vessels Numeric(9) Numeric(9) thal Numeric(9) output Not Null Yes No No No No No No No No No No No No No No SYSTEM TESTING AND IMPLEMENTATION The system testing verifies the whole set of programs that hang together. The strategies for testing include unit testing. system testing. it eliminates communicational problem. programmers negligence or time constraints. Before the system is acceptable by the user testing is very important. Although each test has a different purpose. integration testing. SYSTEM TESTING Testing is a series of different tests that whose primary purpose is to fully exercise the computer based system. which causes error. implementation testing. all work should .

It is the process of executing a programmed with the intend of finding errors. In the proposed system testing is done. The testing is one that will uncover different classes of errors with minimum amount of time and effort. It is the process of executing a programmed with the intend of finding errors. Testing could be viewed as destructive rather than constructive. Testing could be viewed as destructive rather than constructive.verify that all work should verify that all system element have been properly integrated and performed allocated function. Test cases are devised with this purpose in mind. However the data are created with the intent of determining whether the system will process them correctly without any errors to produce the required output. In the proposed system testing is done. Testing is performed to ensure that software function appear to be working according to the specifications and that the performance requirement of the system. Test cases are is a set of data that the system will process as an input. Testing is performed to ensure that software function appear to be working according to the specifications and that the performance requirement of the system. Testing is the process of checking whether the developed system works. The philosophy behind testing is to find the errors. According to the actual requirement and objectives of the system. Testing Methodologies Black Box Testing . The testing is one that will uncover different classes of errors with minimum amount of time and effort. A good test is one that undiscovered error.

Exercise all logical . Functional Testing and black box type testing geared to functional requirements of an application. White Box Testing White box testing sometimes called glass box testing is a test case design method that uses the control structure of the procedural design to derive test cases. Incorrect or missing functions. the software engineer can derive test cases that guarantee that all independent paths within a module have been exercised at least once. Initialization and termination errors. That is black box testing enables the software engineer to derive sets of input conditions that will fully exercise all functional requirements for a program. The system testing to be done here is that to check with all the peripherals used in the project. heavy repletion of certain actions or inputs. covers all combined parts of a system. This type of testing should be done by testers. Our project does the functional testing of what input given and what output should be obtained. System Testing-black box type testing that is based on overall requirements specifications. Using white box testing methods. Also used to describe such tests as system functional testing while under unusually heavy loads. Interface errors. Performance Testing-term often used interchangeably with ‘stresses’ and ‘load’ testing. Errors in data structures or external data base access Behavior or performance errors. Black box testing attempts to find errors in the following categories. input of large numerical values.Black box testing also called behavioral testing focuses on the functional requirements of the software. Ideally ‘performance’ testing is defined in requirements documentation or QA or Test Plans. Stress Testing-term often used interchangeably with ‘load’ and ‘performance’ testing.

functional design. It is oriented to ‘prevention’. phase-out. maintenance. test planning. and other aspects. It includes aspects such as initial concept. may require developing test modules or test harnesses. and ensuring that problems are found and dealt with. Quality Assurance Software Quality Assurance involves the entire software development processmonitoring and improving the process. Unit Testing The most ‘micro’ scale of testing to test particular functions or code modules. Software Life Cycle The life cycle begins when an application is first conceived and ends when it is no longer in use. Validation refers to a different set of activities that ensures that the software has been built is traceable to customer requirements. Not always easily done unless the application has a well designed architecture with tight code. it is done by the programmer and not by tester. integration. .decisions on their true and false sides. documentation planning. Exercise internal data structures to ensure their validity. requirements analysis. retesting. as it requires detailed knowledge of the internal program design and code. Typically. document preparation. making sure that any agreed-upon standards and procedures are followed. coding. internal design. updates. testing. Execute all loops at their boundaries and within their operational bounds. Verification and Validation Verification refers to the set of activities that ensure that software correctly implements a specific function.

Implementation is a stage of project when the system design is turned into a working system. quality and configuration audits.  Detection and correction of internal error. feasibility study. The performance of reliability of the system was tested and it gained acceptance. algorithm analysis. SYSTEM IMPLEMENTATION System implementation is a stage in a stage in the project where the where the theoretical designs turned into working system. simulation.Verification and validation encompasses a wide array of SQA activities that include formal technical reviews. documentation review.  Feeding the real time data and retesting. The stage consists of the following steps.  Testing the system to meet the user requirement. database review. The system was implemented successfully. The most crucial stage the user confidence that the new system will work effectively and efficiently. Implementation is a process that means converting a new system into operation.  Making necessary change as described by the user. performance monitoring. qualification testing and installation testing. During the implementation stage a live demon was undertaken and and made in front of end-users. Proper implementation is essential to provide a reliable system to meet organization requirements.  Testing the developed program with sample data. . development testing.

g.. The system extracts hidden knowledge from a historical heart disease database. DMX query language and functions are used to build and access the models.Conclusion A prototype heart disease prediction system is developed using three data mining classification modeling techniques. we proceed our work with Implementing Naives Baye’s algorithms. Continuous data can also be used instead of just categorical data. Decision Trees. each with its own strength with respect to ease of model interpretation. Naïve Bayes fared better than Decision Trees as it could identify all the significant medical predictors. Decision Trees results are easier to read and interpret. IHDPS can be further enhanced and expanded. Time Series. Although not the most effective model. it can incorporate other medical attributes besides the 15 listed in Figure 1. All three models could answer complex queries. It can also incorporate other data mining techniques. For example. Lift Chart and Classification Matrix methods are used to evaluate the effectiveness of the models. two. All three models are able to extract patterns in response to the predictable state. The relationship between attributes produced by Neural Network is more difficult to understand. access to detailed information and accuracy. Naïve Bayes could answer four out of the five goals.  Formula used in them are . Clustering and Association Rules. Another challenge would be to integrate data mining and text mining Future Enhancements  As we get a data set . Another area is to use Text Mining to mine the vast amount of unstructured data available in healthcare databases. and Neural Network. three. The models are trained and validated against a test dataset. The most effective model to predict patients with heart disease appears to be Naïve Bayes followed by Neural Network and Decision Trees. e. The drill through feature to access detailed patients’ profiles is only available in Decision Trees. The goals are evaluated against the trained models. Five mining goals are defined based on business intelligence and data exploration.

BIBLIOGRAPHY 1. 3..0”. . “SYSTEM ANALYSIS AND DESIGN”. “MICROSOFT C# PROGRAMMING”. Christian Nagel. third edition. “THE COMPLETE GUIDE TO C# PROGRAMMING”. Roger S.  Modeling and the standardized notations used allow to express complex ideas in a precise way through a WEB. 5. Galgotia Publications Private Limited Companies. Elias M. Dreamtech press.◦ Prob(B given A) = Prob(A and B) / Prob(A)  To get input from client part we yet prepare a questionnaire (for the patient).Award’s. Prentice hall of India Pvt ltd. second edition. “SOFTWARE ENGINEERING”. 4. “PROFESSIONAL C#”. 6. 2. TataMcGraw Hill Publications.K. TataMcGraw Hill Publications. Simon Robinson. Wiley Dreamtech India Pvt ltd. 1997 Edition. “THE COMPLETE REFERNCE C# 2. fifth edition. Andy Harris. Karli Watson.. Herbert Schildt. V.Jain.Pressman.

Sign up to vote on this title
UsefulNot useful