You are on page 1of 38

HEART DISEASE PREDICTION

SUBMITTED FOR PARTIAL FULFILLMENT OF THE DEGREE

OF

BACHELOR OF TECHNOLOGY

(COMPUTER SCIENCE ENGINEERING)

Submitted By: Submitted To:


Dinesh Kumar Prof. Priyanka Arora

1706419 Prof. Inderjit Singh

Prof. Kapil Sharma

Department of Computer Science and Engineering

i
GURU NANAK DEV ENGINEERING COLLEGE

LUDHIANA 141006

ii
ACKNOWLEDGEMENT

I am highly grateful to the Dr. Sehijpal Singh, Principal, Guru Nanak Dev Engineering College
(GNDEC), Ludhiana, for providing this opportunity to carry out the six month industrial training at
ANSH Infotech .

The constant guidance and encouragement received from Prof. G.S.Sodhi Dean T&P, GNDEC
Ludhiana has been of great help in carrying out the project work and is acknowledged with
reverential thanks.

I would like to express a deep sense of gratitude and thanks profusely to Mr. Ansh Aneja
Director/CEO of Company. Without the wise counsel and able guidance, it would have been
impossible to complete the report in this manner.

The help rendered by Mr. Akashdeep singh, Data Scientist of CSE department for
experimentation is greatly acknowledged.

I express gratitude to other faculty members of Information Technology department of GNDEC for
their intellectual support throughout the course of this work.

Finally, WE are indebted to all whosoever have contributed in this report work and friendly stay at
Ansh Infotech.

Dinesh Kumar.

iii
TABLE OF CONTENTS

CONTENTS PAGE NO.

CERTIFICATE II

ACKNOWLEDGEMENT III

LIST OF FIGURES V

List Of Tables vi

[Chapter 1] Introduction To Company 1

[Chapter 2] Introduction To Project 2-5

2.1) Overview 2

2.2) Existing System 2-3

2.3) User Requirements Analysis 3

2.4) Feasible Study 3-4

2.5) Objectives Of Project 5

[Chapter 3] Product Design 6-8

3.1) User Requirements 6

3.2) Use Case Model/Flow Chart 6

iv
3.3) Database Design 6-7

3.4) ER Diagram 8

3.5) Specific Requirement 8

[Chapter-4] Development And Implementation 9-22

4.1)Introduction To Languages 9-13

4.2)Implementation With Screen Shots 13-19

CONTENTS PAGE NO.

4.3) Testing 19-22

[Chapter-5] Conclusion And Future Scope 23-24

5.1) Conclusion 23

5.2) Future Scope 24

v
REFERENCE 25

vi
LIST OF FIGURES

Fig. No Fig.Description Page No.

Figure 1 ER Diagram 8

Figure 2 Functional Requirement 8

Figure 3 Machine Learning Algorithms 11

vii
LIST OF TABLES

Table No. Table Description Page No.

Table 1 Data Flow Diagram 6

viii
ix
Chapter -1 : Introduction To Company

ANSH InfoTech is an Information Technology & Product Base Company, in India.

ANSH InfoTech is committed to Empowering Business Transformation. The company was

founded with its core values focused on Creative, People Driven, Socially Conscious, Intense and

Ethical. The company provides software products, IT services, for a variety of industries including

insurance, banking, capital markets and retail markets. These products and services include

managed IT services, application software development and maintenance, payment systems,

business intelligence, web development, IT consulting and various transaction processing services.

We undertake projects located around the world and are committed to creating a challenging and

rewarding work environment whereby our team members are goal focused and motivated

throughout our clients' engagement. Our goal is to provide the most cost effective solutions while

keeping in mind the urgency to launch products without sacrificing the quality. Excellence lies at

our heart; we constantly challenge the convention. Excellence, Partnership and Commitment are

the three hallmarks with which we approach our clients, we believe and you will see this in our

service. Company also has a Training division that offers learning management and training

delivery solutions to Corporations, Institutions and Individuals. It is Global Leader in Skills &

Talent Development having three main lines of business: Corporate Learning Group (CLG), Skills

and Careers Group (SCG), and (SLG).


[Chapter-2] Introduction To Project

[2.1] Overview

Prediction and diagnosing of heart disease become a challenging factor faced by doctors and
hospitals both in India and abroad. In order to reduce the large scale of deaths from heart diseases,
a quick and efficient detection technique is to be discovered. Data mining techniques and machine
learning algorithms play a very important role in this area. The researchers accelerating their
research works to develop a software with the help machine learning algorithm which can help
doctors to take decision regarding both prediction and diagnosing of heart disease. The main
objective of this research paper is predicting the heart disease of a patient using machine learning
algorithms. Comparative study of the various performance of machine learning algorithms is done
through graphical representation of the results.The highest mortality of both India and abroad is
due to heart disease. So it is vital time to check this death toll by correctly identifying the disease in
initial stage. The matter become a headache for all doctors both in India and abroad. Now a days
doctors are adopting many scientific technologies and methodology for both identification and
diagnosing not only common disease, but also many fatal diseases. The successful treatment is
always attributed by right and accurate diagnosis. Doctors may sometimes fail to take accurate
decisions while diagnosing the heart disease of a patient, therefore heart disease prediction systems
which use machine learning algorithms assist in such cases to get accurate results.

[2.2] Existing System

Though there are many existing systems, but those seems to be very expensive as if a user wants a

regular heart check up then hospitals will become quite expensive. So better system instead of them


can be heart disease prediction machine which is developed and modified using python language.

so, it can give an idea to the user that he will face any heart problem in future or not. Through this

user can regularly check heart problem just to get a rough idea without wasting a huge amount of

money in hospitals.

Limitations of exiting system


1) It is a time-consuming process.
2) It is very expensive.

2.3) User Requirements


A software requirements specification (SRS) is a detailed description of a software system to be
developed with its functional and non-functional requirements. The SRS is developed based the
agreement between customer and contractors. It may include the use cases of how user is going to
interact with software system. The software requirement specification document consistent of all
necessary requirements required for project development. To develop the software system we
should have clear understanding of Software system. To achieve this we need to continuous
communication with customers to gather all requirements.

A good SRS defines the how Software System will interact with all internal modules, hardware,

communication with other programs and human user interactions with wide range of real life

scenarios. Using the Software requirements specification (SRS) document on QA lead, managers

creates test plan. It is very important that testers must be cleared with every detail specified in this

document in order to avoid faults in test cases and its expected results.

It is highly recommended to review or test SRS documents before start writing test cases and

making any plan for testing

2.3.1) SOFTWARE REQUIREMENTS


The technical specifications of requirements for the software are as follows:

1) About Operating System: Windows XP or Later.
2) Windows 7 or later.
3) Linux / Ubuntu ( Excluding Kali Linux ).
4) Python Version 3.5 or more.

2.3.2) HARDWARE REQUIREMENTS

Hardware requirements include that hardware which is required for its working. It includes:
1) Pentium 4 Computer (Minimum).
2) 512 MB RAM (Minimum).
3) Hard Disk - 40GB (Minimum).
4) Network Interface (For Communication).

[2.4] Feasible Study


The feasibility of the project is analysed in this phase to put forth a very general plan for the project
and some cost estimates. During system analysis the feasibility study of the proposed system is to
be carried out. This is to ensure that the proposed system is not a burden to the company. For
feasibility analysis, some understanding of the major requirements for the system is essential.Three
key considerations involved in the feasibility analysis are:
1) Technical Feasibility
2) Economic Feasibility
3) Operational Feasibility

[2.4.1] TECHNICAL FEASIBILITY

This assessment is based on an outline design of system requirements, to determine whether the
store has the technical expertise to handle completion of the project. When writing the report, the
following were under consideration:
1) A brief description of the framework to assess more possible factors which could affect the
study
2) The part of the business being examined(marketing, economical


3) The human and economic factor
4) The possible solutions to the problem within minimum time
The technical feasibility assessment is focused on gaining an understanding of the present technical

resources of the store and their applicability to the expected needs of the proposed system. It is an

evaluation of the hardware and software and how it meets the need of the proposed system.

After the above analysis, it was concluded that the system is technically feasible.

2.4.2 ECONOMICAL FEASIBILITY

The purpose of an economic feasibility study is to demonstrate the net benefit of a proposed project

for accepting or disbursing electronic funds/benefits, taking into consideration the benefits and

costs to the agency, other state agencies, and the general public as a whole.

The following costs were estimated:


1) One time development cost
2) One time H/W cost(if the user does not have a computer system
3) Benefits in reduced cost, error and saving will be made by reduction of present system expenses,
time saving and increased accuracy

2.4.3 OPERATIONAL FEASIBILITY

Operational feasibility is the measure of how well a proposed system solves the problems, and

takes advantage of the opportunities identified during scope definition and how it satisfies the

requirements identified in the requirements analysis phase of system development.

The operational feasibility assessment focuses on the degree to which the proposed development

project fits in with the existing business environment and objectives with regard to development

schedule, delivery date, corporate culture and existing business processes.

To ensure success, desired operational outcomes must be imparted during design and development.
These include such design-dependent parameters as reliability, maintainability, supportable,


usability, reducibility, disposability, sustainability, affordability and others.

[2.5] Objectives Of Project

1) Finding the rate of heart disease in human beings.


2) Prediction of heart problems using machine learning algorithm.
3) Representing and measuring the variation of each feature responsible for heart problems.

[Chapter-3] Product Design

3.1) User Requirements


1) The system should ensure the security of the data.
2) The System should reduce human efforts and saves time.
3) Project should be able to analyze the data and predict results with great precision.
4) This project should enable you to navigate through the features in data set.
5) This project should enable you to create duplicate data if it is required.
6) The system should also be user-friendly. It should use a Graphical User Interface (GUI).

3.2) DATA FLOW DIAGRAMS

A data dictionary is a structured repository of data about data. It is a set of rigorous definition of all

DFD data elements and data structures.

[Table-1]

Symbol Meaning Example


An entity. A
source of
data or a
destination
for data.

A process o
r task that is
performed by
the system.

A data
store, a place
where data is
held between
processes.

A data flow.

DFD Levels and Layers


A data flow diagram can dive into progressively more detail by using levels and layers, zeroing in
on a particular piece. DFD levels are numbered 0, 1 or 2, and occasionally go to even Level 3 or
beyond. The necessary level of detail depends on the scope of what you are trying to accomplish.

3.3) Database Design


1) Cholesterol.
2) Glucose.
3) Blood Pressure.

4) Age.

3.3.1) DATABASE MANIPULATION

1) male:0 = Female; 1 = Male.


2) age:Age at exam time
3) education:1 = Some High School; 2 = High School or GED; 3 = Some College or Vocational
school; 4 = college
4) currentSmoker:0 = nonsmoker; 1 = smoker.
5) cigsPerDay:number of cigarettes smoked per day (estimated average)
6) BPMeds:0 = Not on Blood Pressure medications; 1 = Is on Blood Pressure medications
7) Prevalent Stroke
8) Diabetes:0 = No; 1 = Yes
9) totCholmg/dL
10) BMI Body Mass Index calculated as: Weight (kg) / Height(meter-squared).
11) heartRateBeats/Min (Ventricular)
12) glucosemg/dL
13) TenYearCHD

3.4) ER Diagram
An entity relationship diagram (ERD) shows the relationships of entity sets stored in a database. An
entity in this context is a component of data. In other words, ER diagrams illustrate the logical
structure of databases.
At first glance an entity relationship diagram looks very much like a flowchart. It is the specialized
symbols, and the meanings of those symbols, that make it unique.


3.5) SPECIFIC REQUIREMENT
The functional requirements part discusses the functionalities required from the system. The

system is considered to perform a set of high-level functions {fi}. The functional view of the

system is shown. Each function fi of the system can be considered as a transformation of a set

of input data (ii) to the corresponding set of output data (oi). The user can get some

meaningful piece of work done using a high-level function.

[Chapter-4] Development And Implementation

4.1) Introduction To Languages


4.1.1) PYTHON

Python is an interpreted high-level programming language for general-purpose programming.



Created by Guido van Rossum and first released in 1991, Python has a design philosophy that
emphasizes code readability, notably using significant whitespace. It provides constructs that
enable clear programming on both small and large scales. In July 2018, Van Rossum stepped down
as the leader in the language community after 30 years.
Python features a dynamic type system and automatic memory management. It supports multiple
programming paradigms, including object-oriented, imperative, functional and procedural, and has
a large and comprehensive standard library.
Python interpreters are available for many operating systems. CPython, the reference
implementation of Python, is open source software and has a community-based development
model, as do nearly all of Python's other implementations. Python and CPython are managed by the
non-profit Python Software Foundation.

Python uses dynamic typing, and a combination of reference counting and a cycle-detecting
garbage collector for memory management. It also features dynamic name resolution (late binding),
which binds method and variable names during program execution.
Python's design offers some support for functional programming in the Lisp tradition.
It has filter(), map(), and reduce() functions; list comprehensions, dictionaries, and sets;
and generator expressions. The standard library has two modules (itertools and functools) that
implement functional tools borrowed from Haskell and Standard ML.
The language's core philosophy is summarized in the document The Zen of Python (PEP 20),
which includes aphorisms such as:
1) Beautiful is better than ugly.
2) Explicit is better than implicit.
3) Simple is better than complex.
4) Readability counts.
Rather than having all of its functionality built into its core, Python was designed to be highly
extensible. This compact modularity has made it particularly popular as a means of adding
programmable interfaces to existing applications. Van Rossum's vision of a small core language
with a large standard library and easily extensible interpreter stemmed from his frustrations
with ABC, which espoused the opposite approach.

10
While offering choice in coding methodology, the Python philosophy rejects exuberant syntax
(such as that of Perl) in favor of a simpler, less-cluttered grammar. As Alex Martelli put it: "To
describe something as 'clever' is not considered a compliment in the Python culture." Python's
philosophy rejects the Perl "there is more than one way to do it" approach to language design in
favor of "there should be one—and preferably only one—obvious way to do it".

4.2.1) MACHINE LEARNING


Machine learning (ML) is a field of artificial intelligence that uses statistical techniques to give
computer systems the ability to "learn" (e.g., progressively improve performance on a specific task)
from data, without being explicitly programmed.
The name machine learning was coined in 1959 by Arthur Samuel. Machine learning explores the
study and construction of algorithms that can learn from and make predictions on data – such
algorithms overcome following strictly static program instructions by making data-driven
predictions or decisions, through building a model from sample inputs. Machine learning is
employed in a range of computing tasks where designing and programming explicit algorithms
with good performance is difficult or infeasible; example applications include email filtering,
detection of network intruders, and computer vision.
Machine learning is closely related to (and often overlaps with) computational statistics, which also
focuses on prediction-making through the use of computers. It has strong ties to mathematical
optimization, which delivers methods, theory and application domains to the field. Machine
learning is sometimes conflated with data mining, where the latter subfield focuses more on
exploratory data analysis and is known as unsupervised learning.
Within the field of data analytics, machine learning is a method used to devise complex models and
algorithms that lend themselves to prediction; in commercial use, this is known as predictive
analytics. These analytical models allow researchers, data scientists, engineers, and analysts to
"produce reliable, repeatable decisions and results" and uncover "hidden insights" through learning
from historical relationships and trends in the data.
Machine learning tasks Machine learning tasks are typically classified into several broad
categories:
Supervised learning: The computer is presented with example inputs and their desired outputs,
given by a "teacher", and the goal is to learn a general rule that maps inputs to outputs. As special
11
cases, the input signal can be only partially available, or restricted to special feedback.
Semi-supervised learning: The computer is given only an incomplete training signal: a training set
with some (often many) of the target outputs missing.
Active learning: The computer can only obtain training labels for a limited set of instances (based
on a budget), and also has to optimize its choice of objects to acquire labels for. When used
interactively, these can be presented to the user for labeling.
Unsupervised learning: No labels are given to the learning algorithm, leaving it on its own to find
structure in its input. Unsupervised learning can be a goal in itself (discovering hidden patterns in
data) or a means towards an end (feature learning).
Reinforcement learning: Data (in form of rewards and punishments) are given only as feedback to
the program's actions in a dynamic environment, such as driving a vehicle or playing a game
against an opponent.

Machine Learning Algorithms

A support vector machine is a classifier that divides its input space into two regions, separated by
a linear boundary. Here, it has learned to distinguish black and white circles.

Among other categories of machine learning problems, learning to learn learns its own inductive

bias based on previous experience. Developmental learning, elaborated for robot learning,

generates its own sequences (also called curriculum) of learning situations to cumulatively acquire

repertoires of novel skills through autonomous self-exploration and social interaction with human

12
teachers and using guidance mechanisms such as active learning, maturation, motor synergies, and

imitation.

A core objective of a learner is to generalize from its experience. Generalization in this context is
the ability of a learning machine to perform accurately on new, unseen examples/tasks after having
experienced a learning data set. The training examples come from some generally unknown
probability distribution (considered representative of the space of occurrences) and the learner has
to build a general model about this space that enables it to produce sufficiently accurate predictions
in new cases.
The computational analysis of machine learning algorithms and their performance is a branch of
theoretical computer science known as computational learning theory. Because training sets are
finite and the future is uncertain, learning theory usually does not yield guarantees of the
performance of algorithms. Instead, probabilistic bounds on the performance are quite common.
The bias–variance decomposition is one way to quantify generalization error.
For the best performance in the context of generalization, the complexity of the hypothesis should
match the complexity of the function underlying the data. If the hypothesis is less complex than the
function, then the model has under fit the data. If the complexity of the model
is increased in response, then the training error decreases. But if the hypothesis is too complex,
then the model is subject to overfitting and generalization will be poorer.
In addition to performance bounds, computational learning theorists study the time complexity and
feasibility of learning. In computational learning theory, a computation is considered feasible if it
can be done in polynomial time. There are two kinds of time complexity results. Positive results
show that a certain class of functions can be learned in polynomial time. Negative results show that
certain classes cannot be learned in polynomial time.
Machine learning, reorganized as a separate field, started to flourish in the 1990s. The field
changed its goal from achieving artificial intelligence to tackling solvable problems of a practical
nature. It shifted focus away from the symbolic approaches it had inherited from AI, and toward
methods and models borrowed from statistics and probability theory.[14] It also benefited from the
increasing availability of digitized information, and the ability to distribute it via the Internet.
Machine learning also has intimate ties to optimization: many learning problems are formulated as
minimization of some loss function on a training set of examples. Loss functions express the

13
discrepancy between the predictions of the model being trained and the actual problem instances
(for example, in classification, one wants to assign a label to instances, and models are trained to
correctly predict the pre-assigned labels of a set of examples). The difference between the two
fields arises from the goal of generalization: while optimization algorithms can minimize the loss
on a training set, machine learning is concerned with minimizing the loss on unseen samples.

4.2) Implementation With Screen Shots

1) First we define all the libraries we require for our project

14
2) Data set is linked and details are shown

3) Table containing null values are dropped

15
4) Data set is arranged in an order

16
5) Define the relation between various entities in data set required for disease prediction.

17
18
19
6) Import sklearn library for logistic regression. Data set is divided into two parts of ratio 80
and 20 .80 for training and 20 for testing.

20
7) Testing

8) Accuracy check

21
4.3) Testing

Software testing is an investigation conducted to provide stakeholders with information about the
quality of the product or service under test. Software testing can also provide an objective,
independent view of the software to allow the business to appreciate and understand the risks of
software implementation. Test techniques include the process of executing a program or
application with the intent of finding software bugs (errors or other defects).
Software testing involves the execution of a software component or system component to evaluate
one or more properties of interest. In general, these properties indicate the extent to which the
component or system under test:

1) Meets the requirements that guided its design and development.

2) Responds correctly to all kinds of inputs

22
3) Performs its functions within an acceptable time

4) Sufficiently usable

5) Can be installed and run in its intended environments

6) Achieves the general result its stakeholders desire

As the number of possible tests for even simple software components is practically infinite, all
software testing uses some strategy to select tests that are feasible for the available time and
resources. As a result, software testing typically (but not exclusively) attempts to execute a
program or application with the intent of finding software bugs (errors or other defects). The job of
testing is an iterative process as when one bug is fixed, it can illuminate other, deeper bugs, or can
even create new ones.
Software testing can provide objective, independent information about the quality of software and
risk of its failure to users and/or sponsors.
Software testing can be conducted as soon as executable software (even if partially complete)
exists.
The overall approach to software development often determines when and how testing is
conducted.
For example, in a phased process, most testing occurs after system requirements have been defined
and then implemented in testable programs. In contrast, under an Agile approach, requirements,
programming, and testing are often done concurrently.
Data-driven testing (DDT) is a term used in the testing of computer software to describe testing
done using a table of conditions directly as test inputs and verifiable outputs as well as the process
where test environment settings and control are not hard-coded. In the simplest form the tester
supplies the inputs from a row in the table and expects the outputs which occur in the same row.
The table typically contains values which correspond to boundary or partition input spaces. In the
control methodology, test configuration is "read" from a database.

23
Data-driven testing is the creation of test scripts to run together with their related data sets in a
framework. The framework provides re-usable test logic to reduce maintenance and improve
test coverage. Input and result (test criteria) data values can be stored in one or more central data
sources or databases, the actual format and organization can be implementation specific.
The data comprises variables used for both input values and output verification values. In advanced
(mature) automation environments data can be harvested from a running system using a
purpose-built custom tool or sniffer, the DDT framework thus performs playback of harvested data
producing a powerful automated regression testing tool.
Navigation through the program, reading of the data sources, and logging of test status and
information are all coded in the test script.

4.3.1 TEST CASE


A test case is a document, which has a set of test data, preconditions, expected results and post
conditions, developed for a particular test scenario in order to verify compliance against a specific
requirement. Test Case acts as the starting point for the test execution, and after applying a set of
input values, the application has a definitive outcome and leaves the system at some end point or
also known as execution post condition. A test case is a set of conditions or variables under which a
tester will determine whether an application, software system or one of its features is working as it
was originally established for it to do.

4.3.2 TYPICAL TEST CASE PARAMETERS

1. Test Case ID
2. Test Scenario
3. Test Case Description
4. Test Steps
5. Prerequisite
6. Test Data
7. Expected Result
8. Test Parameters
9. Actual Result
10. Environment

24
11. Information Comments

4.3.3 TEST SCENARIO

The exhaustive testing is not possible due to large number of data combinations and large number
of possible paths in the software. Scenario testing is to make sure that end to end functionality of
application under test is working as expected. Also check if the all business flows are working as
expected. In scenario testing tester need to put his/her foot in the end users shoes to check and
perform the action as how they are using application under test. In scenario testing the preparation
of scenarios would be the most important part, to prepare the scenario tester needs to consult or
take help from the client, stakeholder or developers.

25
[Chapter-5] Conclusion and Future Scope

5.1) CONCLUSION

Heart Disease Detection can be of great help to a whole society. By using one of the above models,
a framework can be developed that can determine your heart disease by predicting it using various

26
features. The heart disease determined can be further used to predict the longevity of an individual
by using one of the machine learning algorithms. These results can be used by pediatricians for
further diagnosing and treating the patient in case of some heart disorder.
The goals that are achieved by the software are:
1) Optimum utilization of resources.
2) Efficient management of records.
3) Simplification of the operations.
4) Less processing time and getting required information.
5) User friendly.
6) Portable and flexible for further enhancement.
7) User can manage its account and its transactions of orders.
8) A Store manager can give various offers, whichever fits best to him.
9) User can redeem these offers and earn profits.
10) User can order without going around and looking for products.

6.2 FUTURE SCOPE

As no application is perfect so there are certain amendments in every application with passage of
time, so there is a big need for the updates in the application. Certain modules and features which
are to be added are
1) A feature of retailer Interface is to be inserted wherein whenever the quantity of any product is
below 10% of a given amount, a message shall be sent to the retailers asking them for the prices.
The one which gives lowest cost price, can be contacted
2) Better User Interface Designs will be introduced, which will enhance the performance of the
application.
3) A common application for all the Stores can be developed in which each store is recognized
with a unique code.
4) Generalized version can be formed in which services will not be limited to a single Filling
station.

27
28
REFERENCE

1) https://adc.bmj.com/content/81/2/172.short
2) https://rpubs.com/ssshinde/Heart_disease_prediction
3) https://www.kaggle.com/amanajmera1/framingham-heart-study-dataset
4) https://www.sciencepubco.com/index.php/ijet/article/view/10557
5) https://ieeexplore.ieee.org/abstract/document/938240/
6) https://www.sciencedirect.com/science/article/pii/089561119500005B
7) https://www.sabudh.org
8) https://www.pucho.comhttps://www.udemy.com

29

You might also like