You are on page 1of 21

MINI PROJECT REPORT

ON

SMART HEALTH PREDICTOR


B.TECH COMPUTER SCIENCE & ENGINEERING

Submitted by:

Bhavya Chopra - 2000910100049


Hiteshwarm Dubey - 2000910100082
Sarthak Shrivastava - 2000910100161
Shivansh Singh Bisht - 2000910100172

GROUP NO:- G54

Department of Computer Science and Engineering

JSS Academy of Technical Education, Noida

ODD SEM 2022


TABLE OF CONTENTS

S.No Topic Page No.


1. INTRODUCTION 1
2. MOTIVATION 2
3. OBJECTIVE(S) 3
4. METHODOLOGY/PROCESS FLOW 4
5. HARDWARE & SOFTWARE REQUIREMENTS 9
6. SNAPSHOTS OF PROJECT 13
7. APPLICATION OF PROJECT 15
8. CONCLUSION 16
9. FUTURE SCOPE 17

10. REFERENCES 18

i
INTRODUCTION

❖ Nowadays, humans face various diseases due to the current environmental conditions and their
living habits.

❖ The identification and prediction of such diseases at their earlier stages are very important, so
as to prevent the extremity of it.

❖ It is difficult for doctors to prescribe all the various tests for multiple diseases to an ever
increasing number of patients.

❖ The goal of this project is to identify and predict the patients with diseases such as Diabetes,
Heart disease, Parkinson disease.

❖ Our modern healthcare system had faced huge challenges exacerbated by the pandemic, a rise in
lifestyle-related diseases, and an exploding world population.

❖ The good news is that most large healthcare organizations are beginning to make use of some
form of AI. However, we’re still early in the journey of learning how we can apply artificial
intelligence to make healthcare better.

❖ In healthcare, the most common application of traditional machine learning is precision


medicine – predicting what treatment protocols are likely to succeed on a patient based on
various patient attributes and the treatment context.

❖ The healthcare organizations that will be the most successful are the ones that will be able to
fundamentally rethink and reimagine their workflows and processes and use machine learning
and AI to create a truly intelligent health system.

1
MOTIVATION

In recent years, some researchers have used various machine learning-based approaches to
develop autonomous disease detection systems, and early disease identification may help to
reduce the number of people who suffer.

The disease detection models aim to bring the medical and artificial intelligence (AI) fields
together so that people can understand how well AI and medicine can work together.

● First, we'll go over the highlights and motives for using AI in the healthcare industry.
Following that, we go over machine-learning-based algorithms for integrating AI and the
healthcare sector in depth.

● Next, we go over the technical problems of AI in the medical industry first, and then
show how machine learning can help. We also look into the impact of machine learning
in the medical field.

● Moreover, we also present several notable initiatives that demonstrate the importance of
machine learning in healthcare applications and services.

Disease detection driven by artificial intelligence (AI) has demonstrated to be an effective tool
for identifying undiagnosed patients with complex common as well as rare diseases.

The use of these algorithms is driven by awareness that underdiagnosis leads to a heavy burden
for patients and healthcare professionals, and is also a challenge for pharmaceutical companies
seeking to expand the patient pool for their medications, whether to power clinical trials or to
efficiently target healthcare.

Developing an effective disease detection algorithm is a multifaceted solution involving


technical, clinical, and operational expertise.

2
OBJECTIVES

The Proposed Work aims to meet the following objectives:

➢ Minimize Risks of Misdiagnosis:

Managing patients’ records manually is prone to diagnostic errors, inaccuracies and is


time-consuming. But the health apps nullify all such potential challenges that might
prove fatal for the patient.

Helps store an accurate report of the patient’s health condition digitally in the app.
This assists doctors to prescribe the right medicine with the correct dosage and
chemical compositions. In situations when a patient hops from one healthcare provider
to another, this data can be extracted easily to make quick medical decisions.

➢ Accessibility:

Make healthcare more accessible. With mHealth, people living in rural areas,
physically disabled people, and elderly people can easily gain access to medical care.
As for healthcare practitioners, they can reach wider audiences (even at the
international level).

➢ Predictive analytics increase the accuracy of diagnoses:

Physicians can use predictive algorithms to help them make more accurate diagnoses.
For example, when patients come to the ER with chest pain, it is often difficult to
know whether the patient should be hospitalized. If the doctors were able to answer
questions about the patient and his condition into a system with a tested and accurate
predictive algorithm that would assess the likelihood that the patient could be sent
home safely, then their own clinical judgments would be aided. The prediction would
not replace their judgments but rather would assist.

3
METHODOLOGY/PROCESS FLOW

A block diagram of the basic steps adopted for each machine learning model is shown in the figure:

1. Patient Database:
The first step of the project was to determine the database that we would utilize for our
machine learning and training models.
This led us to use the datasets available on Kaggle, we then proceeded to our next step.

4
2. Data Preprocessing:
Data preprocessing is a process of preparing the raw data and making it suitable for a
machine learning model. It is the first and crucial step while creating a machine learning
model.

A real-world data generally contains noises, missing values, and maybe in an unusable
format which cannot be directly used for machine learning models. Data preprocessing is
required for cleaning the data and making it suitable for a machine learning model which
also increases the accuracy and efficiency of a machine learning model.
It involves below steps:
● Getting the dataset
● Importing libraries
● Importing datasets
● Finding Missing Data
● Encoding Categorical Data
● Splitting dataset into training and test set.,
● Feature scaling

This produces a “Cleansed Dataset” which will then be used in training and testing our
machine learning models

3. Training Models:

Now we have to train out machine learning models with our now cleansed data.
A machine learning model is defined as a mathematical representation of the output of
the training process.

These can be understood as a program that has been trained to find patterns within new
data and make predictions. These models are represented as a mathematical function that
takes requests in the form of input data, makes predictions on input data, and then
provides an output in response. First, these models are trained over a set of data, and then
they are provided an algorithm to reason over data, extract the pattern from feed data and
learn from those data. Once these models get trained, they can be used to predict the
unseen dataset.

There are various types of machine learning models available based on different business
goals and data sets:

● Supervised Learning

● Unsupervised Learning

● Reinforcement Learning

5
4. Testing Models:
Now we trained our models using the following algorithms and kept the ones with the
highest accuracy.

a) Support Vector Machine (SVM):


Support vector machine or SVM is the popular machine learning algorithm, which is
widely used for classification and regression tasks. However, specifically, it is used to
solve classification problems.

The main aim of SVM is to find the best decision boundaries in an N-dimensional space,
which can segregate data points into classes, and the best decision boundary is known as
Hyperplane. SVM selects the extreme vector to find the hyperplane, and these vectors are
known as support vectors.

6
b) Decision Tree:

● A decision tree is a supervised


learning algorithm that is mainly used
to solve classification problems but
can also be used for solving the
regression problems. It can work with
both categorical variables and
continuous variables.
● It shows a tree-like structure that
includes nodes and branches, and
starts with the root node that expands
on further branches till the leaf node.
The internal node is used to represent
the features of the dataset, branches show the decision rules, and leaf nodes
represent the outcome of the problem

c) Logistic Regression:

● Logistic Regression is a significant machine learning algorithm because it has the ability
to provide probabilities and classify new data using continuous and discrete datasets.
● Logistic Regression can be used to
classify the observations using
different types of data and can easily
determine the most effective
variables used for the classification
● Logistic regression is the supervised
learning algorithm, which is used to
predict the categorical variables or
discrete values. It can be used for
the classification problems in
machine learning, and the output of
the logistic regression algorithm can
be either Yes or NO, 0 or 1, Red or
Blue, etc.
● Logistic regression is similar to linear regression except how they are used, such as
Linear regression is used to solve the regression problem and predict continuous values,
whereas Logistic regression is used to solve the Classification problem and used to
predict the discrete values.

7
5. Collection of Results:

The quantity and quality of the collected data will determine the efficiency of the output. The
more data, the more accurate the prediction will be.

This step includes the below tasks:

● Identify various data sources


● Collect data
● Integrate the data obtained from different sources

Then we compare and analyze the efficiency of our models and improve our results.

6. Deployment:
The above-prepared model is producing an accurate result as per our requirement with
acceptable speed, then we deploy the model in the real system. But before deploying the project,
we will check whether it is improving its performance using available data or not. The
deployment phase is similar to making the final report for a project.

8
HARDWARE & SOFTWARE REQUIREMENTS

1. Python :

https://www.python.org

Python is an interpreted, object-oriented, high-level programming


language with dynamic semantics. Its high-level built in data
structures, combined with dynamic typing and dynamic binding,
make it very attractive for Rapid Application Development, as
well as for use as a scripting or glue language to connect existing
components together.

Python's simple, easy to learn syntax emphasizes readability and therefore reduces the cost of
program maintenance.

Python supports modules and packages, which encourages program modularity and code reuse.
The Python interpreter and the extensive standard library are available in source or binary form
without charge for all major platforms, and can be freely distributed.

2. NumPy:

https://numpy.org

NumPy is the fundamental package for scientific computing in Python. It is a Python library that
provides a multidimensional array object, various derived objects (such as masked arrays and
matrices), and an assortment of routines for fast operations on arrays, including mathematical,
logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear
algebra, basic statistical operations, random simulation and much more.

At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays
of homogeneous data types, with many operations being performed in compiled code for
performance.

NumPy fully supports an object-oriented approach, starting, once again, with ndarray. For
example, ndarray is a class, possessing numerous methods and attributes. Many of its methods
are mirrored by functions in the outermost NumPy namespace, allowing the programmer to code
in whichever paradigm they prefer. This flexibility has allowed the NumPy array dialect and

9
NumPy ndarray class to become the de-facto language of multi-dimensional data interchange
used in Python.

3. Scikit-Learn (Sklearn):

https://scikit-learn.org/stable/

Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It
provides a selection of efficient tools for machine learning and statistical modeling including
classification, regression, clustering and dimensionality reduction via a consistent interface in
Python. This library, which is largely written in Python, is built upon NumPy, SciPy and
Matplotlib.

Rather than focusing on loading, manipulating and summarizing data, Scikit-learn library is
focused on modeling the data. Some of the most popular groups of models provided by Sklearn
are as follows −

● Supervised Learning algorithms − Almost all the popular supervised learning


algorithms, like Linear Regression, Support Vector Machine (SVM), Decision Tree etc.,
are part of scikit-learn.

● Unsupervised Learning algorithms − On the other hand, it also has all the popular
unsupervised learning algorithms from clustering, factor analysis, PCA (Principal
Component Analysis) to unsupervised neural networks.

● Clustering − This model is used for grouping unlabeled data.

● Cross Validation − It is used to check the accuracy of supervised models on unseen data.

● Dimensionality Reduction − It is used for reducing the number of attributes in data


which can be further used for summarisation, visualization and feature selection.

● Ensemble methods − As the name suggests, it is used for combining the predictions of
multiple supervised models

● Feature extraction − It is used to extract the features from data to define the attributes in
image and text data.

● Feature selection − It is used to identify useful attributes to create supervised models.


● Open Source − It is an open source library and also commercially usable under BSD
license.

10
4. Kaggle :

https://www.kaggle.com

Kaggle, a subsidiary of Google LLC, is an online community of data scientists and machine
learning practitioners. Kaggle allows users to find and publish data sets, explore and build
models in a web-based data-science environment, work with other data scientists and machine
learning engineers, and enter competitions to solve data science challenges.
Kaggle was first launched in 2010 by offering machine learning competitions and now also
offers a public data platform, a cloud-based workbench for data science, and Artificial
Intelligence education. Its key personnel were Anthony Goldbloom and Jeremy Howard.
Nicholas Gruen was the founding chair succeeded by Max Levchin. Equity was raised in 2011
valuing the company at $25.2 million. On 8 March 2017, Google announced that they were
acquiring Kaggle.

5. Streamlit.io :

https://streamlit.io

Streamlit is an open-source Python library


that makes it easy to create and share
beautiful, custom web apps for machine
learning and data science.

Streamlit is an open source app framework in Python language. It helps us create web apps for
data science and machine learning in a short time. It is compatible with major Python libraries
such as scikit-learn, Keras, PyTorch, SymPy(latex), NumPy, pandas, Matplotlib etc. With
Streamlit, no callbacks are needed since widgets are treated as variables. Data caching simplifies
and speeds up computation pipelines. Streamlit watches for changes on updates of the linked Git
repository and the application will be deployed automatically in the shared link.

11
6. Excel:

https://support.microsoft.com/en-us/excel

Excel is a spreadsheet program from Microsoft and a


component of its Office product group for business
applications. Microsoft Excel enables users to format,
organize and calculate data in a spreadsheet.

By organizing data using software like Excel, data analysts and other users can make information
easier to view as data is added or changed. Excel contains a large number of boxes called cells
that are ordered in rows and columns. Data is placed in these cells.
Excel is a part of the Microsoft Office and Office 365 suites and is compatible with other
applications in the Office suite. The spreadsheet software is available for Windows, macOS,
Android and iOS platforms.

Microsoft Excel allows you to examine and interpret data in a variety of ways. The information
could come from several different places. A variety of formats and conversions are available for
the data. Conditional Formatting, Ranges, Tables, Text functions, Date functions, Time functions,
financial functions, Subtotals, Quick Analysis, Formula Auditing, Inquire Tool, What-if
Analysis, Solvers, Data Model, PowerPivot, PowerView, PowerMap, and other Excel
commands, functions, and tools can all be used to analyze it.

12
SNAPSHOTS OF PROJECT

13
14
APPLICATIONS OF PROJECT

The use of machine learning models in healthcare has increased. The ability of machine learning models
to bring out the meaning from data and to prediction is used for early prediction of diseases. Machine
learning is used in heart disease problems to bring out solutions to complex problems. For instance,
some data mining techniques are applied to heart disease data to determine patterns and help in the
prediction of heart disease. In another study , a hybrid of machine learning models is proposed for the
diagnosis of heart disease.

❖ Research and prediction of disease


❖ Automation of hospital administrative processes
❖ Early detection of disease
❖ Prevention of unnecessary doctor’s visits
❖ Discovery of new drugs
❖ More accurate calculation of health insurance rates
❖ More effective sharing of patient data
❖ Personalization of patient care

15
CONCLUSION

● By using different types of data mining and machine learning techniques to predict the
occurrence of heart disease have summarized.

● We have determined the prediction performance of each algorithm and apply the proposed
system for the area it needed. Use more relevant feature selection methods to improve the
accurate performance of algorithms. There are several treatment methods for patients, if they are
diagnosed with a particular form of heart disease. Data mining can be of much knowledge from
such a suitable dataset.
./,.,.
● As identified through the literature survey, only a marginal success is achieved in the creation of
predictive models for heart disease patients and hence there is a need for combinational and more
complex models to increase the accuracy of predicting the early onset of heart disease. With the
more amount of data being fed into the database the system will be very intelligent.

● There are many possible improvements that could be explored to improve the scalability and
accuracy of this prediction system. Would like to make use of testing different discretization
techniques, multiple classifier voting techniques and different decision tree types namely
information gain and gain ratio.

● We are willing to explore different rules such as association rules, logistic regression and
clustering algorithms.

Our team has created a repository in github for our project. Following is the link for the same:-

https://github.com/SarthakShri/SmartHealthPredictor

We were also successful in deploying our web-app that can be accessed via the link:-

https://sarthakshri-smarthealthpredictor-mlprojmodel-3trylk.streamlit.app

16
FUTURE SCOPE

● These are the few potential areas where Machine Learning can help the healthcare industry out
of many scenarios. With machine learning applications, the healthcare and medicine segment can
advance into a new realm and completely transform the healthcare operations.

● The findings in this project can be helpful in the early screening of potential diabetes, and heart
disease patients. It can be helpful in the sense that the first screening can be performed at the
comfort of home. If a high risk of disease is predicted in a patient, then it can be followed by
clinical trials for confirmation.

● In the near future, one may apply the proposed model to some other applications such as
handwritten recognition, image filtering, cancer classification, and medical image segmentation
and additionally may use various meta-heuristic techniques to tune the initial parameters of the
proposed machine learning models.

● As the healthcare domain is dynamic and this issue is a challenge to data mining, it is also a
forcing motivation to the data mining applications in healthcare. This dynamism gives way to
new horizons and more data mining applications will be employed to discover new patterns and
associations.

● In the view of the subjects examined in this project, future data mining studies seem to take
place, not limited but in considerable weight, in distributed data mining applications and text
mining algorithms. With the help of data mining algorithms, the classification performance
increases. This can be further enhanced and expanded with more prediction algorithms for major
life threatening diseases.

● In the future, it can be planned to experiment with more methods of feature selection such as Ant
Colony optimization, particle swarm optimization to further improve the system performance. It
may also be planned to develop a system to diagnose heart disease using deep learning methods.
Furthermore, there is a possibility to extend the proposed methodology to diagnose other chronic
diseases such as chronic kidney disease and cancer. It is also planned to deploy the system under
the supervision of a doctor to test the performance of the system through real data in real time. In
addition, the proposed system can be extended using IoT devices for the collection of clinical
parameters in real-time.

17
REFERENCES

[1] M. Kumari and S. Godara, “Comparative Study of Data Mining Classification Methods in
Cardiovascular Disease Prediction”, International Journal of Computer Science and
Technology, vol. 2, no. 2, pp. 304-308, Jun 2011.

[2] S. Rajasekaran and A. V. Pai, “Neural Networks, Fuzzy Logic and Genetic Algorithms:
Synthesis and Application”, Eight Economy Edition, PHI, 2003.

[3] D. Tomar and S. Agarwal, “A Survey on Data Mining Approaches for Healthcare”,
International Journal of Bio-Science and Bio- Technology, vol.5, no.5, 2013.

[4] Singh P, Singh S, Pandi-Jain GS. Effective heart disease prediction system using data mining
techniques. Int J Nanomed. 2018;13(T-NANO 2014 Abstracts):121–4.

[5] Sayad AT, Halkarnikar. Diagnosis of heart disease using neural network approach.
Int J Adv Sci Eng Technol. 2014;2:88–92.

[6] Aurélien Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and
TensorFlow, 2nd Edition Released September 2019

[7] John D. Kelleher & Brian Mac Namee & Aoife D'Arcy , Fundamentals of Machine
Learning for Predictive Data Analytics, The MIT Press; 2nd edition (20 October
2020)

[8] Sidey-Gibbons JA, Sidey-Gibbons CJ, Machine learning in medicine: a practical


introduction. BMC medical research methodology 2019; 19(1):64,
doi:https://doi.org/10.1186/s12874-019- 0681-4.

[9] LeCun Y, Bengio Y, Hinton G, Deep learning. nature 2015; 521(7553):436–444,


doi:https://doi.org/10.1038/nature14539.
[10] Goodfellow I, Bengio Y, Courville A, Deep learning. MIT press, 2016.

18
http://www.deeplearningbook.org

[11] Kononenko I, Machine learning for medical diagnosis: history, state of the art and
perspective.
Artificial Intelligence in medicine 2001; 23(1):89–109,
doi:https://doi.org/10.1016/S0933-3657

[12] Brodley CE, UtgoffPE, Multivariate decision trees. Machine learning 1995; 19(1):45–77,
doi:https://doi.org/10.1023/A:1022607123649

[13] Melville P, Sindhwani V, Recommender systems. Encyclopedia of machine learning


2010; 1:829– 838, doi:10.1007/978-0-387-30164-8 705

[14] Ghahramani Z, Unsupervised learning. In: Summer School on Machine Learning,


Springer, 2003, pp. 72–112.

[15] Rajkomar A, Dean J, Kohane I, Machine learning in medicine. New England Journal of
Medicine
2019; 380(14):1347–1358, doi:10.1056/NEJMra1814259

[16] Ghassemi M, Naumann T, Schulam P, Beam AL, Chen IY, Ranganath R, A review of
challenges and opportunities in machine learning for health. AMIA Summits on
Translational Science Proceedings,
2020; 2020:191

[17] Shameer K, Johnson KW, Glicksberg BS, Dudley JT, Sengupta PP, Machine learning in
cardiovas-cular medicine: are we there yet? Heart 2018; 104(14):1156–1164,
doi:10.1136/heartjnl-2017-31 1198.

19

You might also like