You are on page 1of 54

CYBER MEDICINE AND

RECOMMENDATION

A Major Project

Submitted in partial fulfillment of the requirement for the award of the degree of

Bachelor of Technology
In

COMPUTER SCIENCE AND ENGINEERING

By

AMAN AGGARWAL (11019210037)


CHIRAYU DHINGRA (11019210028)
NITIN KAUSHIK (11019210035)

Under Supervision of
Ms. Kanika Pahwa
(Assistant Professor)
Department of Computer Science & Engineering

SRM UNIVERSITY DELHI-NCR


Plot No.39, Rajiv Gandhi Education City, Sonepat, Haryana – 131029
May 2023
Approval Sheet

This Project work entitled Cyber medicine and recommendation by Aman Aggarwal, Chirayu Dhingra
and Nitin Kaushik of course - B. Tech, branch - CSE, year – 2023, semester - 8th in major project
(CS4114) during the academic session 2022-2023 is approved for the degree of Bachelor of Technology
(B. Tech (CSE) with specialization inAI and Data Science)

Examiner(s)

Supervisor (s)

Ms. Kanika Pahwa

Head of the Department

Dr. Puneet Goswami

Date:

Place: ____________

i
CANDIDATE’S DECLARATION
We hereby declare that the project entitled "Cyber medicine and Recommendation" submitted for the B. Tech.
(CSE - C) degree is our original work and the project has not formed the basis for the award of any other
degree, diploma, fellowship or any other similar titles. We hereby declare that this written submission
represents our own idea in our own words and where other’s ideas or words have been included, they have been
adequately cited and referenced the original sources. We also declare that we have adhered to all principles of
academic honesty and integrity and have not misrepresented or fabricated or falsified any idea/data/fact/source
in our submission.

Aman Aggarwal
Reg No. 11019210037

Chirayu Dhingra
Reg No. 11019210028

Nitin Kaushik
Reg No. 11019210035

ii
CERTIFICATE
This is to certify that the project titled "Cyber medicine and recommendation" is the bonafide work carried out
by Aman Aggarwal, Chirayu Dhingra and Nitin Kaushik students of B. Tech. (CSE- C) of SRM University
Delhi-NCR, Sonipat, Haryana-131029 during the academic year 2019-2023, in partial fulfillment of the
requirements for the award of the degree of Bachelor of Technology(Computer Science and Engineering) and
that the project has not formed the basis for the award previously of any other degree, diploma, fellowship
orany other similar title.

Signature of HOD Signature of the Guide

Place:Sonipat

iii
ACKNOWLEDGEMENT
Most importantly, we would like to express our immense gratitude to our supervisor Ms. Kanika Pahwa , for her
patient guidance, encouragement, training and advice she has provided throughout this time. The knowledge we
have gained throughout this time period will stay with us for years to come. We have been extremely lucky to
have such Supervisor who cared so our work, and responded to our questions and queries so promptly. It was a
great privilege to get to collaborate with her and work under her guidance. Thought the following Project and
report is an individual work, we could never be able to explore the depths of topic, without the help, support,
guidance & efforts of our able supervisor. Her infectious enthusiasm and unlimited zeal have been major
driving forces throughout the work.

Aman Aggarwal
Reg No. 11019210037

Chirayu Dhingra
Reg No. 11019210028

Nitin Kaushik
Reg No. 11019210035

iv
ABSTRACT

The modern world places a high value on one's physical and mental well-being. People now prefer to have
a healthy lifestyle, which may be attained by engaging in regular exercise and eating a balanced diet.
However, due to their busy schedules, people in today's fast-paced society have less time to care about their
health. People then start looking online for information about the diseases they have, what to do to treat
them, and what medications to take.

Due to the abundance of unreliable health data available, this way of thinking is where the issue lies. People
begin to believe the information that is posted on internet, YouTube, and other places. People therefore
have an incorrect understanding of their ailment and how to treat it. Moreover many times the doctor curing
the patient doesn’t tell about type of diet and exercise.

This is where our project cyber medicine and recommendation jumps in. Our project offers people with
recommendation regarding the disease they have. These recommendations include diet, medicine and
exercise. Also our projects offers a service in which people can get alternative names of the medicine they
are using in case it is unavailable to them. Including this the project predicts the patient disease through
their symptoms. And give them some insights regarding that disease.

v
CONTENTS

1. Approval Sheet.................................................................................................................... i

2. Candidates Declaration..................................................................................................... ii

3. Certificate........................................................................................................................... iii

4. Acknowledgement............................................................................................................... iv

5. Abstract............................................................................................................................... v

6. Introduction……………………………………………………………...………………...1
1.1 Introduction to the project 1

7. Literature Survey………………………………………………………………………….3
2.1 Literature Survey 3
2.2 Summary 9
2.3 Integrated Summary 9
2.4 Problem Statement 10

8. Proposed Methodology………………………………………………………………….11
3.1. Overall description of the Project 11
3.2. Functional requirements 11
3.3. Block diagram 24
3.4. Algorithms 26

9. Result and Discussion……………………………………….............................................32


4.1. Implementation details and Issues 32
4.2.Evaluation Parameters 32
4.3. Results 38

10. Conclusion and Future Works………………………………………………………….43


5.1. Findings 43
5.2. Future Work 43

11, References………………………………………………………………………………..44

12. Plagiarism Report………………………………………………………………………..45

vi
LIST OF TABLES

Table 2.1 Literature Survey Analysis……………………………………………………….….............................3

Table 4.1 Result of Logistic Regression on disease datasets…………………………..….……………….…….38

Table 4.2 Result of Random forest classifier on disease datasets………………………..…………….………..38

Table 4.3 Result of Knn classifier on disease datasets…………………………………………………....….….39

Table 4.4 Result of Decision tree classifier on disease datasets………………………………..………...….…39

Table 4.5 Result of Support vector machine on disease datasets………………………………..…….……...…40

Table 4.6 Result of Gaussian NB on disease datasets……………………………………………………...…...40

Table 4.7 Result of Bernoulli NB on disease datasets……………………………………………..………...….41

Table 4.8 Result of Gradient boosting classifier on disease datasets……………………………………………41

Table 4.9 Result of CNN model on pneumonia disease dataset…………………………………………..……..42

Table 4.10 Result of Vgg16 model on pneumonia disease dataset………………….…………………….…….42

vii
LIST OF FIGURES

Figure 3.1 Flow of the web application……………………………………..…………………..….…….…..…24

Figure 3.2 Diagram for Admin/User login Procedure……………………………………..…….……..….……25

Figure 3.3 Working of disease Prediction………………………………………..……….……………..……...25

Figure 3.4 Working of symptom based disease prediction……………………….……………………………..26

viii
CHAPTER -1

1.1 Introduction to the Project

Health is a crucial aspect of our lives, and the prevalence of ill-health represents a significant
humanitarian issue that comes from various political, economic, and social factors. Unfortunately, in
today's fast-paced world, many individuals neglect their health due to their hectic daily routines.
Consequently, people of all age groups are increasingly affected by diseases such as diabetes and stroke,
leading to a decline in overall life expectancy. To address this pressing need for improved health
management, our team has developed a cyber medicine and recommendation project.

This project aims to guide individuals towards a healthier lifestyle by using algorithms and
recommendations. When a patient enters their medical details pertaining to a specific disease they wish to
assess, our project performs a analysis. Using algorithms, it then generates exercise recommendations that
are best suited to the individual's condition. Additionally, our application provides a list of meals that the
patient should consume to enhance their well-being, along with a list of foods they should avoid. Moreover,
the project offers medication suggestions necessary for the patient to improve their health.

Another valuable feature of our project is the alternative medicine recommendation section. Here, patients
can enter the name of a particular medication they are currently prescribed, and our system will generate a
list of suitable alternative medicines. This functionality ensures that patients have access to a wider range of
options.

Moreover, our project includes a basic forum where patients can describe their symptoms and the duration
they have been experiencing them. With this information, along with any additional minor symptoms
provided, our system predicts the potential underlying disease. This forum allows individuals to gain
preliminary insights into their health issues and seek further medical attention accordingly.

To underscore the urgency and relevance of our project, let us consider some statistics. According to recent
data, the global prevalence of diabetes has reached an alarming 9.3% among adults, affecting
approximately 463 million individuals worldwide. Additionally, stroke is the second leading cause of death
globally, accounting for an estimated 11% of total deaths.

Chronic diseases, such as heart disease, cancer, diabetes, and respiratory diseases, are the leading cause of
death and disability worldwide. According to the World Health Organization (WHO), chronic diseases
account for approximately 71% of all deaths globally, with 85% of these deaths occurring in low- and
middle-income countries.

Diabetes is a significant public health concern. The International Diabetes Federation reports that the
number of adults living with diabetes is expected to rise from 463 million in 2019 to 700 million by 2045.
This alarming increase emphasizes the urgency of effective health management and preventive measures.

1
Cardiovascular diseases, including stroke and coronary artery disease, remain the leading cause of death
worldwide. The American Heart Association states that approximately 17.9 million deaths occur each year
due to cardiovascular diseases, accounting for 31% of all global deaths.

Access to healthcare services and resources is unevenly distributed worldwide. Many individuals,
particularly in low-income communities, face barriers such as limited access to healthcare facilities,
inadequate health education, and financial constraints. Projects like cyber medicine and recommendation
can help bridge this gap by providing accessible and personalized health information.

These statistics highlight the critical need for health management solutions that can help individuals
prevent, manage, and improve their overall well-being.

In summary, our cyber medicine and recommendation project aims to address the growing health
challenges faced by people across different age groups. By providing exercise recommendations, dietary
guidelines, medication suggestions, alternative medicine options, and a symptom-based disease prediction
system, we strive to empower individuals to make informed decisions about their health. By leveraging
technology, we aspire to enhance life expectancy and contribute to a healthier society overall.

2
CHAPTER-2

2.1 Literature Survey

Table 2.1: Literature survey analysis


Research Yea
Author paper name r Proposed Advantages GAPS Scope
The proposed
system model
comprises of
various stages
like data
collection,
execution and
production of
output to give an
accurate
decision in
healthcare
recommendation
system. Deep
Neuro Fuzzy
based technique
is proposed for
risk and severity In the future
prediction. The work, real time
data collected data can be
from various utilized
data resources is through IoT
further Provide sensors for
processed for recommendation evaluation
recommendation s to the patients purpose. And
and decision and risk analysis to increase the
making by based on the safety in the
Health doctors or severity of validation
Vedna Recommendatio patients. This diseases using phase there is
Sharma, n System by model comprises fuzzy technique. Only heart, need to
Surender using deep various domain The implemented kidney and implement
Singh learning and and analytical systems have liver diseases neural network
Samant fuzzy technique 2020 tools. accuracy of 90%. are analyzed. technology.

3
Other learning
techniques can
also be used as
A Journal pending
Recommendatio Linear research to
n System (JRS) discriminant improve
is analysis is used No result performance.
Sonal Jain, Journal proposed.Which which is a regarding the Additionally,
Harshita Recommendatio will solve the advantage when overall hybrid methods
Khangarot, n System problem of a large numbers accuracy of can be used to
Shivank Using Content- publication for of parameters are the model was provide more
Singh Based Filtering 2019 many authors. taken in interest discussed. scoring.
In the future,
we can
improve the
Implemented model by
Doctor acquiring large
Recommendatio and massive
n which will tell datasets
user to consult a The system directly from
specific type of could reduce hospitals and
Doctor after medical errors implementing
Predicting the and improve Every deep learning
disease. Created patient results. algorithm algorithms. We
Disease authentication Simple method gives different can implement
Prediction and system including of getting to the accuracy for the entire
Anand Doctor data collection application for different project on an
Kumar, Recommendatio Portal for Disease diseases. No Android
Ganesh n System using collecting the Prediction and single application to
Kumar Machine datasets for Doctor algorithm can make it more
Sharma, Learning building ML Recommendatio be chosen for accessible to
Prakash U M Approaches 2021 Models. n System. all diseases. more users.

4
Future research
will focus on
proposed a creating the
virtual agent and ELRA to
recommendation continuously
-based update the
architecture syllabus and
called ELRA to learning
address the materials, as
problems of Compared to well as
education in an other verifying the
e-learning conventional progress of
environment. techniques, the both learners
The proposed experimental and teachers, to
ELRA is results show that With lesser improve the
Sadia Ali , designed to ELRA number of course quality
Yaser Hafeez help teachers techniques courses in an online
, Mamoona Enabling and greatly increase students gets learning
Humayun, recommendation learners/students skills and bored. The environment.
Nor Shahida system , for finding accomplishments interface does And to make
Mohd Jamail, architecture in courses that are , as well as not work well use of better
Muhammad virtualized a good fit for learning success with quizzes algorithms to
Aqib, Asif environment for their interests by more than and slot improve ELRA
Nawaz e-learning 2021 and preferences. 90%. booking. accuracy.

5
Future work
will include a
tracking system
for diet and
exercise and in
the next section
will provide an
alternative
preferences
related to user
sentiment
Proposed a towards a
system that aims particular food
at improving the items or
health of the assignments if
patients the user's
suffering from It only deals preferences
various diseases with the change and the
by health creation of a
recommending monitoring of regular and
them healthier disease like urgent warning
diet and exercise Diabetes, system to
plans by Blood remind user
analyzing and Pressure and before each
e-Health monitoring Thyroid. C4.5 monitoring
Divya Monitoring health decision tree session and as
Mogaveera, System with parameters and algorithm a warning to
Vedant Diet and Fitness the values from Helps the doctor used needs the user in
Mathur, Recommendatio their latest in making more some cases of
Sagar n using Machine reports related to accurate improvements extreme
Waghela Learning 2021 the disease. decisions. . relationships.

6
We can
develop a
context
recognition
middleware for
This study To prevent a the
proposes a coronary heart management of
personalization disease in a health that
diet home easily. The proposed obtains the
recommended Users can model is only activity and
Jong-Hun Design of Diet service for the receive the limited to the information of
Kim, Jung- Recommendatio users who recommendation heart disease, diet patterns
Hyun Lee, n System for require the of various diets so it cannot (meal time,
Jee-Song Healthcare prevention and by recommend number of
Park, Young- Service Based management for considering the diet for meals, and
Ho Lee, Kee- on User coronary heart preference of patients of amount of
Wook Rim Information 2020 disease. foods of users. other diseases. meals) of users.
It could only To use a
generate diet database which
menu on a has a large
daily basis. dataset of
This is due to dishes. The
The proposed the limited database
system will number of should be
provide advice to data about enhanced with
the dishes more complete
patient in the available in nutritional
form of total the database, information
Proposed a nutrition which is about each
Personalized components to insufficient dish, so that the
Application of Diet be taken for generating algorithm
Wahidah Data Mining Recommendatio daily, as well as a week's menu coverage could
Husain, Lee Techniques in a n System for suggested dishes with unique be wider,
Jing Wei, Personalized Cancer Patients for the type of dishes that preserving a
Sooi Li Diet to help patients diet menu achieve higher
Cheng and Recommendatio manage their corresponding to weekly accuracy result
Nasriah n System for daily food the total nutrition nutritional from the
Zakaria Cancer Patients 2020 intake. advised. requirements. system.

7
To use a
database which
has a large
Proposed system dataset of
extracts the dishes and the
features which The diet plan is database
are responsible created should be
for Chronic according to the enhanced with
Kidney Disease, potassium levels more complete
Akash then use of the patient's nutritional
Maurya, machine kidney. Other information
Rahul Wable, Chronic Kidney learning to conditions like about each dish
Rasika Disease automate the diabetes and high No clear especially
Shinde , Prediction and classification of blood pressure mention of about its
Sebin John , Recommendatio the chronic conditions are algorithm potassium
Rahul n of Suitable kidney disease in also considered being used levels , so that
Jadhav, Diet plan by different stages while and data on the algorithm
Dakshayani. using Machine according to its recommending the accuracy coverage could
R Learning 2019 severity. the diet. of the model. be wider,
We can
improve the
Classifying the accuracy and
Proposed a user as healthy or diversity of the
fitness unhealthy based recommended
recommender on their age, videos by using
system called B- weight, height, more profound
Fit. The aims of RBC, WBC, datasets for
Brunda R, the proposed haemoglobin, e.g., datasets
Preethi K S, system is to platelets, sugar The dataset regarding
Sushmitha N provide users etc. in their used is small Indian cuisine.
Reddy, with blood and the food And try to
Khushi B-Fit: A Fitness recommended parameters. Food dataset used is provide
Aralikatte, and Health videos that are recommendation not particular recommendatio
Dr. Manjula Recommendatio both relevant based on the to indian ns of small
S n System 2022 and diverse. user’s BMI cuisine. videos.

8
2.2 Summary

● The system model that is being presented includes a number of steps, including data gathering, execution,
and output creation, to provide correct decision-making in the healthcare recommendation system. A
method based on deep neurofuzzy theory is suggested for predicting risk and severity. The information
gathered from diverse data sources is further analysed to help physicians and patients make
recommendations and decisions. This model includes a variety of analytical and domain tools.

● It is suggested to use a Journal Recommendation System (JRS). which will help many authors who are
having publication issues.

● Doctor recommendations have been included, instructing users to consult a particular kind of doctor after
predicting the disease. created an authentication system with a data collection portal to gather the datasets
needed to build machine learning models.

● proposed ELRA to address the issues with education in an e-learning environment. ELRA is a virtual agent
and recommendation-based architecture. To assist teachers and learners/students in locating courses that are
a good fit for their interests and preferences, the suggested ELRA was created.

● proposed a system that analyses and keeps track of patients' health metrics and the data from their most
recent disease-related reports with the goal of advising healthier food and activity routines to patients
suffering from various ailments.

● For users who need coronary heart disease prevention and care, this study suggests a personalised diet
recommendation service.

● To assist patients in controlling their daily food intake, a personalised diet recommendation system for
cancer patients was proposed.

● The suggested technique extracts the characteristics that cause chronic kidney disease and uses machine
learning to automatically classify the condition into distinct phases based on its severity.

● proposed the B-Fit fitness recommendation system. The suggested method intends to offer viewers
recommendations for videos that are both interesting and diversified.

2.3 Integrated Summary

Several key points can be extracted from the provided statements regarding healthcare recommendation
systems. Firstly, a deep neuro fuzzy-based approach is suggested for predicting disease risk and severity by
analyzing diverse data sources, aiding physicians and patients in making informed decisions. Additionally,
a Journal Recommendation System (JRS) is proposed to assist authors struggling with publication issues,
guiding them towards suitable journals for their research papers. Furthermore, doctor recommendations and
9
an authentication system are implemented to facilitate the collection of datasets for building machine
learning models. The e-learning environment is addressed through an E-Learning Recommendation
Architecture (ELRA), incorporating a virtual agent to help teachers and students find relevant courses
based on their preferences. Other proposed systems include personalized diet recommendations for
coronary heart disease and cancer patients, as well as a fitness recommendation system and a machine
learning-based approach for classifying chronic kidney disease. These studies collectively emphasize the
importance of leveraging data analysis, machine learning, and personalized recommendations to enhance
healthcare decision-making, education, and patient care.

2.4 Problem Statement

⮚ In today's fast-paced world, individuals often find themselves grappled with various diseases like chest
pain, diabetes, malaria, pneumonia, and more. However, their busy schedules leave them with very little
time to visit a doctor for proper diagnosis and treatment.

⮚ Due to time constraints, people often start searching for medications online, attempting to self-diagnose
without proper knowledge of their underlying conditions. Unfortunately, this can lead to the ingestion of
incorrect medicines, resulting in adverse reactions and a combination of medication side effects and actual
disease symptoms.

⮚ Moreover, individuals residing in hilly or remote areas face significant challenges when it comes to
accessing adequate healthcare facilities. Limited availability of treatment centers or their distant locations
increases the difficulties faced by these communities, making it harder for them to receive timely medical
attention.

10
CHAPTER 3: PROPOSED METHOLOGY

3.1. Overall description of the Project

In our project, we are undertaking three distinct tasks. The first task involves predicting whether a patient
has a particular disease based on the medical information they provide. The patient will enter values
corresponding to specific information requested by us. The diseases we are predicting include Breast
cancer, Heart-disease, diabetes, pneumonia, stroke, malaria, and liver disease. To achieve accurate disease
prediction, we are utilizing multiple machine learning algorithms for Breast cancer, Heart-disease, diabetes,
stroke, and liver disease. Through this approach, we aim to determine the most effective algorithm, which
will enhance the overall accuracy of the predictive model. For pneumonia, we have chosen to employ the
VGG-16 model, which is a deep convolutional neural network consisting of 16 layers. Similarly, for
malaria, we have utilized the VGG-19 model, which is a deep convolutional neural network with 19 layers.
Incorporating these specific models will facilitate disease prediction by utilizing the patients' lung X-ray
images for pneumonia and their blood images for malaria, thereby improving the accuracy of the
predictions.

The second task in our project involves providing patients with alternative names or substitute options for
the medicines they input. Additionally, we have implemented an auto-correction feature, ensuring that even
if the patient enters the medicine name incorrectly or partially, the model will automatically correct it and
provide suitable alternatives. This functionality aims to assist patients in finding appropriate substitutes for
their prescribed medicines, enhancing convenience and accessibility.

Finally, our project encompasses the development of a basic forum where patients can select the major
symptoms they are experiencing and specify the number of days they have been facing those symptoms.
Patients can also provide information about any additional minor symptoms they may be experiencing. For
this task, we have employed the Decision Tree and Random Forest Classifier algorithms. The forum serves
as a platform for patients to communicate their symptoms, enabling the model to make informed
predictions or suggestions based on the provided information. By utilizing these classification algorithms,
we aim to enhance the accuracy of symptom-based predictions and assist patients in gaining insights into
their health conditions.

3.2. Functional requirements

Technologies to be used:

Machine learning:

Machine learning is a field of study within artificial intelligence that focuses on developing algorithms and
models that enable computer systems to learn and make predictions or decisions without being explicitly
programmed. It involves the use of statistical ways to allow machines to automatically analyze and
interpret data, identify patterns, and make informed decisions or predictions based on that data.

11
At its core, machine learning involves training a model using a large amount of data. The model learns
from this data by relating patterns, connections, and trends that live within it. The data used for training is
generally labeled, meaning that it is accompanied by correct or desired outputs. The model also uses this
labeled data to learn the mapping between the input data and the desired output.

Once the model is trained, it can be used to make predictions or decisions on new, unseen data. It takes the
input data and applies the patterns and connections it learned during training to induce an output or
prediction. The model's performance is evaluated by comparing its predictions with the correct or expected
outputs from the new data.

Machine learning algorithms can be broadly categorized into three types: supervised learning, unsupervised
learning, and reinforcement learning. In supervised learning, the model learns from labeled data, making
predictions grounded on known inputs and outputs. Unsupervised learning involves learning from
unlabeled data, where the model identifies patterns and structures in the data without any specific desired
outputs.

Reinforcement learning involves training an agent to interact with an terrain, learning from feedback in the
form of rewards or penalties to optimize its actions. Meaning that when the machine does the task
accordingly it is given a reward and when it does not do the task accordingly it will not be rewarded rather
it will be given negative feedback. And through this process which is being used repeatedly the machine
learns what to do and what not to.

Machine learning has numerous applications across various domains, including image and speech
recognition, natural language processing, recommendation systems, fraud detection, medical diagnosis, and
many more. It has revolutionized industries by enabling automation, data-driven decision-making, and
bettered effectiveness.

In summary, machine learning is a field that focuses on training models to learn from data, identify
patterns, and make predictions or decisions. It involves the use of algorithms and statistical ways to enable
machines to learn and improve their performance over time without explicit programming. Machine
learning finds applications in diverse areas, contributing to advancements in technology and enhancing
various aspects of our lives.

Deep learning:

Deep learning, a subset of machine learning, has emerged as a powerful approach to solve complex
problems by training deep neural networks with multiple layers. These deep neural networks are designed
to mimic the structure and function of the human brain, enabling them to learn and process data in a
hierarchical manner.

The neural network architecture in deep learning consists of interconnected nodes, or artificial neurons,
organized into layers. Each neuron takes input signals, performs a mathematical operation on them, and
generates an output. The layers are stacked one on top of another, with each layer extracting increasingly
abstract and meaningful representations from the data.

12
The training process in deep learning involves iteratively fine-tuning the network's internal parameters,
known as weights and biases, through a technique called backpropagation. Backpropagation calculates the
gradient of the network's error with respect to its parameters, allowing it to adjust them in a way that
minimizes the error. By repeating this process over a large dataset, the network gradually learns to
recognize and understand complex patterns and relationships in the data.

One of the key advantages of deep learning is its ability to automatically learn features from raw,
unstructured data. Traditional machine learning algorithms often rely on handcrafted features, which
require domain expertise and can be time-consuming and error-prone. In contrast, deep learning models can
learn representations directly from the raw data, eliminating the need for manual feature engineering.

Deep learning has witnessed remarkable success in various domains. In computer vision, deep neural
networks have achieved unprecedented performance in tasks such as image classification, object detection,
and image segmentation. They can learn to recognize objects, scenes, and patterns in images, enabling
applications like self-driving cars, facial recognition, and medical imaging analysis. In natural language
processing, deep learning has revolutionized language understanding and generation tasks. Deep neural
networks can process and comprehend textual data, allowing for tasks like sentiment analysis, language
translation, chatbots, and question-answering systems. They can also generate human-like text, enabling
applications like speech synthesis and language generation.

Deep learning has also made significant contributions to speech recognition and audio processing. By
training deep neural networks on large speech datasets, it has become possible to accurately transcribe
speech, enable voice assistants, and even improve hearing aids.

The impact of deep learning extends beyond these domains. It has been successfully applied in
recommendation systems, fraud detection, financial market analysis, drug discovery, and many other areas.
The ability of deep neural networks to learn intricate patterns and extract meaningful representations from
complex data has opened up new possibilities for solving challenging problems across various fields.

Despite its tremendous success, deep learning also faces certain challenges. Training deep neural networks
requires a large amount of labeled data and substantial computational resources. It can be prone to
overfitting, where the network performs well on the training data but fails to generalize to new, unseen
data. Additionally, the interpretability of deep learning models is often limited, making it difficult to
understand the underlying reasoning behind their predictions.

In conclusion, deep learning is a subset of machine learning that harnesses the power of deep neural
networks to automatically learn and represent complex patterns and relationships in data. Its hierarchical
architecture, coupled with the ability to process unstructured data, has led to breakthroughs in computer
vision, natural language processing, speech recognition, and various other domains. By enabling machines
to learn directly from raw data, deep learning has paved the way for transformative advancements and has
the potential to revolutionize numerous fields in the future.

13
Flask:

Flask is a popular and lightweight web framework written in Python. It provides a simple and flexible way
to make web applications, APIs (Application Programming Interfaces), and other web services. Flask
follows the microframework philosophy, fastening on simplicity and extensibility, allowing developers to
have more control over their applications.

One of the key features of Flask is its minimalist design. It provides only the rudiments for web
development, allowing developers to choose and integrate additional libraries and tools based on their
specific requirements. This flexibility makes Flask suitable for a wide range of applications, from small
personal projects to large-scale enterprise applications.

Flask is erected on top of the Werkzeug WSGI (Web Server Gateway Interface) toolkit and the Jinja2
templating machine. Werkzeug provides the low-level handling of HTTP requests and responses, while
Jinja2 offers a powerful and efficient templating system for generating dynamic HTML or other output
formats.

To start erecting a Flask application, developers define routes, which are URLs that collude to specific
functions or views. These functions are called view functions and are responsible for handling incoming
requests and generating responses. A typical view function in Flask receives a request object and returns a
response object, which can be a rendered template, a JSON response, or any other type of data.

Flask also supports the use of templates, allowing developers to separate the presentation sense from the
application logic. Templates are written using Jinja2 syntax and can include dynamic data, control
structures, and template heritage, furnishing a powerful way to induce dynamic content.
In addition to routes and views, Flask offers various features and extensions that can be fluently integrated
into applications. For illustration, Flask provides built-in support for handling form data, managing user
sessions, handling file uploads, and working with cookies. It also supports secure authentication and
authorization mechanisms, enabling developers to make secure web applications.

Flask's modular design makes it easy to extend and integrate third-party libraries. There is a vast ecosystem
of Flask extensions available, covering areas such as database integration, authentication and authorization,
RESTful APIs, testing, and more. These extensions give additional functionality and simplify common
tasks, saving development time and trouble.

Deployment of Flask applications is straightforward. Flask can be run on any WSGI-compliant web server,
such as Gunicorn or uWSGI. It can also be easily containerized using tools like Docker, allowing for
scalable and portable deployments. Additionally, Flask supports the development of RESTful APIs, making
it a popular choice for erecting backend services that communicate with customer applications or other
systems.

Flask's community is vibrant and active, providing extensive documentation, tutorials, and resources for
learning and getting started. The simplicity and versatility of Flask, combined with its supportive
community, make it an excellent choice for developers seeking a lightweight yet powerful web framework
in Python.
14
In summary, Flask is a lightweight and flexible web framework for Python. It follows a microframework
philosophy, providing the essentials for web development while allowing developers to choose and
integrate additional libraries as demanded. With its minimalist design, support for templates, extensive
ecosystem of extensions, and ease of deployment, Flask empowers developers to make a wide range of web
applications and services efficiently and effectively.

Django:

Django is a high-level Python web framework that follows the model-view-controller (MVC) architectural
pattern. It provides a robust set of tools and libraries for building web applications rapidly and efficiently.

At its core, Django focuses on the principle of "Don't Repeat Yourself" (DRY), which means avoiding
duplication of code by promoting reusability and modularity. It emphasizes clean design, scalability, and
ease of maintenance.

Key components of Django include:

1. Models: Models define the structure and behavior of data in your application. They are represented as
Python classes and interact with the underlying database. Django's Object-Relational Mapping (ORM)
allows you to work with databases using Python code, abstracting away the need to write complex SQL
queries.

2. Views: Views handle the logic behind processing user requests and returning responses. They receive
HTTP requests, retrieve data from the database using models, perform any necessary computations or
transformations, and render templates to generate the final HTML response.

3. Templates: Templates provide a way to generate dynamic HTML pages by combining static content with
data. Django uses its own template language, which allows you to insert variables, control structures, and
filters into your HTML templates. Templates can be reused and extended to create consistent layouts and
design across your application.

4. URLs: URLs define the mapping between user-friendly URLs and the views that handle them. Django
uses a URL dispatcher that matches the requested URL with a corresponding view function, allowing you
to define patterns and parameters in the URL patterns.

5. Forms: Django provides a powerful form handling system that simplifies the process of building and
processing HTML forms. Forms handle data validation, error messages, and can be rendered automatically
based on the model definition. They help you handle user input securely and efficiently.

6. Admin: Django's admin interface is a built-in component that automatically generates a web-based
administration panel for managing your application's data. It allows you to perform CRUD (Create, Read,
Update, Delete) operations on models without writing additional code.

15
7. Middleware: Middleware is a way to process requests and responses globally across your application. It
sits between the web server and the Django application, allowing you to perform operations such as
authentication, request/response modification, and error handling.

8. Authentication and Authorization: Django provides a secure authentication system out of the box,
allowing users to register, log in, and log out. It also includes authorization mechanisms to define user
permissions and access control.

9. Testing: Django includes a comprehensive testing framework that helps you write unit tests for your
application. It provides tools for testing models, views, forms, and other components, ensuring the
reliability and quality of your code.

Django is known for its extensive documentation, strong community support, and a wide range of third-
party packages available through the Python Package Index (PyPI). It is widely used in both small and
large-scale web applications, and it powers popular websites such as Instagram, Pinterest, and The
Washington Post.

Overall, Django simplifies web development by providing a robust foundation, reducing boilerplate code,
and promoting best practices. It allows developers to focus on building application logic and delivering
value to users.

SQLite:

SQLite is a popular relational database management system (RDBMS) known for its lightweight,
serverless, and self-contained nature. It is implemented in the C programming language and offers a
straightforward yet powerful SQL interface for managing data efficiently.

One of the standout features of SQLite is its serverless architecture. Unlike traditional database systems,
SQLite doesn't require a separate server to function. Instead, it operates directly within the application,
simplifying deployment and reducing dependencies. This self-contained approach makes SQLite highly
portable and easy to manage since the entire database, including tables, indexes, views, and triggers, is
stored within a single file on the disk.

Another advantage of SQLite is its cross-platform compatibility. It seamlessly works on various operating
systems such as Windows, macOS, Linux, and even mobile platforms like Android and iOS. This
flexibility makes it a versatile choice for applications targeting multiple platforms.

SQLite adheres to ACID principles, ensuring reliable and consistent transactions. It guarantees that
operations within a transaction are either fully completed or rolled back, maintaining data integrity. This
level of reliability is crucial for applications where data consistency is paramount.

In terms of functionality, SQLite supports a subset of the SQL standard. It provides essential features for
managing databases, including creating tables, inserting, updating, and deleting data, executing complex
queries with joins and filtering conditions, and managing indexes for improved performance.

16
Despite its feature-rich capabilities, SQLite has a small footprint and operates efficiently. The compact
database engine requires minimal system resources and consumes little memory, making it suitable for
applications with limited resources or performance constraints.

Furthermore, SQLite offers flexibility and extensibility. It provides a variety of built-in data types,
accommodating various data requirements. Additionally, users can define custom functions, aggregates,
and virtual tables, allowing the database engine to be tailored to specific application needs.

Due to its wide adoption, SQLite has a large user base and is utilized in a broad range of applications. It is
commonly found in mobile apps, desktop software, embedded systems, web browsers, and IoT devices.
Many programming languages provide bindings or drivers for SQLite, making it accessible from different
development environments.

In conclusion, SQLite is a well-documented, reliable, and efficient database system suitable for applications
that require a lightweight, embedded solution without the complexities of client-server setups. Its serverless
architecture, self-contained nature, cross-platform compatibility, ACID compliance, SQL support, small
footprint, and extensibility contribute to its popularity among developers.

Tensorflow:

TensorFlow is a versatile and scalable open-source library for numerical computation and machine
learning. Developed by the Google Brain team, it has gained significant popularity among developers due
to its extensive capabilities. TensorFlow enables efficient building and training of machine learning
models.

At the heart of TensorFlow lies the concept of data flow graphs. These graphs consist of nodes representing
mathematical operations and edges representing the flow of data between these operations. This approach
enables TensorFlow to harness the power of parallel and distributed computing, making it suitable for tasks
of varying scales.

A key concept in TensorFlow is the tensor, which is a multidimensional array or data structure. Tensors can
hold different types of numeric data, such as integers or floating-point numbers. These tensors flow through
the computational graph, undergoing various operations and transformations.

TensorFlow provides a vast collection of pre-built operations that can be applied to tensors. These
operations include mathematical operations (e.g., addition, multiplication), matrix operations (e.g., matrix
multiplication, transposition), activation functions (e.g., sigmoid, ReLU), and more. Additionally,
TensorFlow allows users to define their custom operations using Python or C++.

When working with TensorFlow, the typical workflow involves the following steps:

Building the computational graph: The first step is to define the structure of the graph by creating nodes for
operations and tensors to hold the data. Inputs, outputs, and connections between nodes are specified to
establish the graph's structure.

17
Executing the graph in a session: After defining the graph, a session is created to execute the operations. A
session allocates resources and performs computations on available devices like CPUs or GPUs. It manages
memory and variables used in the graph.

Feeding data and running computations: In the session, input data can be fed to the graph using
placeholders or variables. Placeholders are used for data that will be provided during execution, while
variables are used for data that will be updated and optimized during training.

Updating variables through optimization: TensorFlow incorporates various optimization algorithms, such
as stochastic gradient descent (SGD), which update variable values based on computed errors or loss
functions. These algorithms iteratively adjust the model's parameters to minimize the discrepancy between
predicted and actual outputs, facilitating model training.

Saving and restoring models: Trained models can be saved to disk for future use or deployment.
TensorFlow provides tools to save and restore models, allowing the reuse of trained models without the
need for retraining.

TensorFlow also offers high-level APIs like Keras, which provide simplified interfaces for common
machine learning tasks. These APIs abstract away low-level details, making it easier for users to build
models.

In summary, TensorFlow is a powerful library for numerical computation and machine learning. It utilizes
data flow graphs and tensors for efficient computations, offering a wide range of operations and
optimization algorithms. By following the steps of building the graph, executing it in a session, feeding
data, and optimizing variables, developers can harness TensorFlow's capabilities to create and train diverse
machine learning models.

Language to be used

HTML:
HTML, short for HyperText Markup Language, is the standard markup language utilized for building and
organizing web pages on the internet. It employs tags to define the structure and content of a webpage.

When a web browser reads an HTML document, it interprets the tags and displays the content accordingly.
Tags are enclosed in angle brackets (<>) and come in pairs—a opening tag and a closing tag. Webpage
content is placed between these tags.

HTML elements can be nested within one another, forming a hierarchical structure. The top-level element
is the <html> element, which encompasses the entire document. Typically, within the <html> element,
you'll find the <head> and <body> elements. The <head> element contains metadata such as the page title,
external stylesheets, or scripts. The <body> element holds the visible content of the webpage.

18
Within the <body> element, various elements can be used to structure and present content. Commonly used
elements include:

- <h1> to <h6> for different heading levels

- <p> for paragraphs

- <a> for hyperlinks

- <img> for inserting images

- <ul> and <ol> for unordered and ordered lists, respectively

- <li> for list items within lists

- <table> for creating tables

- <form> for input forms

These are just a few examples of HTML elements. Attributes can be added to elements to provide
additional information or functionality. Attributes are specified within the opening tag of an element and
consist of a name and a value.

HTML also supports CSS (Cascading Style Sheets) for defining the visual appearance of web pages and
JavaScript for adding interactivity and dynamic behavior.

In summary, HTML is a markup language enabling the structuring and presentation of web content.
Through the use of HTML elements, tags, and attributes, developers can create engaging and interactive
web pages that can be rendered by web browsers.

Javascript:

JavaScript is a popular programming language used for web development. It empowers developers to
enhance web pages by incorporating interactive features, dynamic behavior, and functionality. JavaScript is
widely supported by modern web browsers and can be seamlessly integrated into HTML documents.

To embed JavaScript into HTML, developers utilize the <script> element. This element is typically placed
within the <head> or <body> section of an HTML document. Alternatively, JavaScript code can be stored
in separate external files and linked to HTML documents using the <script> element's "src" attribute.

JavaScript encompasses various key aspects essential for web development:

1. Variables and Data Types: JavaScript enables developers to declare variables for storing and
manipulating data. It supports different data types, including numbers, strings, booleans, arrays, and
objects.
19
2. Functions: JavaScript functions are reusable blocks of code designed to perform specific tasks. They can
be defined and invoked to execute a set of instructions. Functions can also accept parameters and return
values.

3. Control Flow: JavaScript provides control flow statements such as conditionals (if-else statements),
loops (for, while, do-while), and switch statements. These statements determine the execution path based
on certain conditions or iterate over a set of values.

4. DOM Manipulation: The Document Object Model (DOM) represents the hierarchical structure of an
HTML document as a tree-like structure. JavaScript offers APIs (Application Programming Interfaces) to
manipulate the DOM, enabling developers to dynamically modify elements, update content, handle events,
and create interactive web pages.

5. Event Handling: JavaScript facilitates the handling of user interactions, including mouse clicks, keyboard
input, and form submissions. Event handlers can be attached to specific elements to trigger JavaScript code
when the corresponding event occurs.

6. Asynchronous Programming: JavaScript supports asynchronous operations, allowing tasks to be


executed without blocking the execution of other code. This capability is commonly employed for making
AJAX requests to fetch data from servers, performing animations, or handling user input without freezing
the user interface.

7. Error Handling: JavaScript provides mechanisms for managing and handling errors. Developers can
utilize try-catch blocks to catch and handle exceptions, ensuring that code execution proceeds smoothly
even in the presence of errors.

8. Libraries and Frameworks: JavaScript boasts a vast ecosystem of libraries and frameworks that extend its
capabilities and simplify common tasks. Popular libraries include jQuery, React, Angular, and Vue.js,
which provide pre-built solutions for constructing complex web applications.

It's important to note that while JavaScript primarily runs on the user's web browser (client-side language),
the emergence of server-side JavaScript platforms like Node.js has enabled the use of JavaScript for server-
side scripting as well.

In summary, JavaScript is a versatile and powerful programming language pivotal to web development. Its
ability to interact with HTML and CSS, manipulate the DOM, handle events, perform asynchronous
operations, and leverage libraries and frameworks makes it an essential tool for creating dynamic and
interactive web applications.

Python:

Python is a versatile and widely used programming language that offers a range of features and capabilities.
One of its key strengths is its dynamic typing, which allows for flexible variable assignment without
requiring explicit type declarations. This makes Python code concise and easy to read.

20
Python supports both object-oriented and functional programming paradigms, giving developers the
flexibility to choose the approach that best suits their needs. Object-oriented programming allows for the
creation and manipulation of objects, while functional programming focuses on writing code in a
declarative and immutable manner.

Guido Van Rossum created Python in 1989 with the goal of designing a language that emphasized code
readability and simplicity. As a high-level language, Python provides an abstraction layer that hides low-
level details, making it accessible to beginners and experienced programmers alike.

One of the advantages of Python is its platform independence. Python programs can run on various
operating systems, such as Windows, macOS, and Linux, without requiring modifications. This portability
makes debugging and deploying Python code more convenient.

Python is an open-source language, which means that its source code is freely available to view, change,
and distribute it. This has led to a large and active community of developers who contribute to the
language's growth and development. The open-source nature of Python also enables the creation of a vast
ecosystem of libraries and frameworks.

Python boasts a rich library support, with popular libraries such as NumPy, TensorFlow, Selenium, and
OpenCV. NumPy provides efficient numerical operations and multi-dimensional array manipulation,
making it indispensable for scientific computing and data analysis. It has tensorflow is a library for
machine learning and deep learning, through which we can create and train complex neural networks.
Selenium is widely used for automating web browsers, making it valuable for web scraping and testing.
OpenCV is a computer vision library that offers a wide range of tools and algorithms for image and video
processing.

One of Python's strengths is its ability to seamlessly integrate with other programming languages. Through
interfaces and modules, Python code can interact with libraries written in languages such as C, C++, and
Java. This interoperability allows developers to leverage existing code and take advantage of specialized
libraries in their Python projects.

Python finds applications in a variety of domains due to its versatility and extensive library support. In data
visualization, Python offers libraries like Matplotlib and Seaborn, enabling the creation of insightful plots
and graphical representations. In data analytics, Python provides tools such as Pandas and SciPy for
processing and analyzing raw data, uncovering trends, and deriving actionable insights.

Python has gained significant popularity in the field of artificial intelligence (AI) and machine learning
(ML). Its simplicity and powerful libraries like scikit-learn and Keras make it accessible for building and
training ML models. Python is used for tasks such as natural language processing, computer vision, and
predictive analytics, allowing developers to simulate human behavior and learn from large datasets.

Python's versatility extends to web application development as well. Frameworks like Django and Flask
provide robust tools for building scalable and secure web applications. Python's simplicity and readability
make it an excellent choice for both backend development and scripting tasks on the web.

21
Additionally, It is used in business and accounting for statistical analysis, and quantitative and qualitative
analysis. Its extensive library support, including libraries like NumPy and Pandas, makes it a powerful tool
for financial modeling, risk analysis, and decision-making processes.

In conclusion, Python's simplicity, versatility, and extensive library support have made it a popular choice
for a wide range of applications. Whether it's data analysis, AI and machine learning, web development, or
business and accounting, Python provides the tools and flexibility to tackle diverse programming
challenges efficiently and effectively.

Platforms to be used

Colab:

Colab, short for Google Colaboratory, is a cloud-based Python development environment provided by
Google. It offers features for writing, executing, and collaborating on Python code, making it convenient
for data analysis, machine learning, and general-purpose programming.

Colab eliminates the need for setting up a local development environment. Instead, it can be accessed
through a web browser, allowing you to write and run Python code from any device with an internet
connection. This cloud-based approach removes the hassle of installing and configuring Python and its
dependencies, making it accessible to users of all levels.

The interface of Colab resembles Jupyter Notebook, with code organized into cells. These cells can contain
Python code, text explanations, or visualizations, promoting an interactive and iterative coding experience.
Code cells can be run individually or all at once.

One notable advantage of Colab is its ability to utilize powerful hardware resources, including GPUs and
TPUs. This is especially useful for computationally intensive tasks like machine learning model training.
Colab provides free access to a limited amount of GPU and TPU resources, significantly accelerating code
execution.

Colab seamlessly integrates with other Google services and libraries. It has built-in integration with Google
Drive, enabling easy import and export of data between Colab and Google Drive storage. The environment
also includes commonly used libraries and packages for data analysis and machine learning, such as
NumPy, Pandas, Matplotlib, and scikit-learn, saving setup time.

Colab notebooks can be shared and collaborated on with others. You can grant access to specific
individuals or make the notebook publicly available. Collaboration features include leaving comments on
code or text cells, facilitating teamwork and code reviews.

Markdown, a lightweight markup language, is supported in Colab. Markdown cells allow you to include
formatted text, images, and equations, making it suitable for documentation, explanations, or creating richly
formatted reports.
22
Colab enables the installation and use of additional Python packages using the `!pip` command within code
cells. This flexibility allows you to extend Colab's functionality with third-party libraries not included by
default.

While Colab offers many advantages, there are some limitations. The allocated resources for each user have
limits, including maximum execution time and memory usage. Long-running or memory-intensive
processes may be interrupted or terminated. However, you can save your work periodically to Google
Drive or download the notebook for future use.

In summary, Colab is a powerful cloud-based Python development environment. It provides an interactive


notebook interface, access to powerful hardware resources, seamless integration with Google services, and
collaboration features. With its convenience and accessibility, Colab is suitable for data analysis, machine
learning, and general-purpose Python programming for users of all skill levels.

Roboflow:

Roboflow is an advanced computer vision platform that simplifies the process of building and deploying
computer vision models. It provides a range of tools and services designed to streamline and accelerate the
development workflow for computer vision applications.

The core functionality of Roboflow revolves around managing and preprocessing image datasets. It offers a
user-friendly web interface where you can upload, organize, and annotate your image data. Roboflow
supports various annotation formats, including bounding boxes, polygons, and image classifications,
allowing you to label and annotate your dataset accurately.

Once your dataset is uploaded and annotated, Roboflow provides a suite of powerful preprocessing tools.
These tools enable you to augment and transform your images, enhancing their quality and diversity.
Roboflow offers a wide range of preprocessing options such as resizing, cropping, rotating, flipping,
adjusting brightness and contrast, and applying filters. These transformations help to increase the variability
of your dataset, leading to better model performance and robustness.

Roboflow also provides an extensive library of pre-built computer vision models. These models are ready
to use, allowing you to train your models quickly without the need for complex configuration or
implementation. The library includes popular architectures such as YOLO, SSD, and Faster R-CNN, which
are widely used for object detection tasks. By leveraging these pre-built models, you can save time and
effort in developing your own models from scratch.

Training models in Roboflow is a straightforward process. Once you have prepared your dataset and
selected a model, you can initiate the training process with a few clicks. Roboflow takes care of the training
pipeline, including data loading, model training, and performance evaluation. During training, you can
monitor key metrics such as loss, accuracy, and learning curves to assess the model's progress.

Roboflow allows you to iterate and experiment with different training configurations easily. You can adjust
hyperparameters, such as learning rate, batch size, and optimizer, to optimize your model's performance.
Additionally, Roboflow supports transfer learning, which enables you to leverage pre-trained models and
23
fine-tune them on your specific dataset. This approach can significantly speed up training and improve
model accuracy.

After training, Roboflow provides evaluation tools to assess the performance of your models. You can
visualize and analyze the model's predictions on your validation or test dataset. This helps in understanding
the model's strengths and weaknesses and identifying areas for improvement.

Once you have trained and evaluated your models, Roboflow offers deployment options to bring your
computer vision models into production. It provides integrations with popular deployment platforms and
frameworks, making it easy to deploy models to cloud environments, edge devices, or as APIs.

In summary, Roboflow is a comprehensive computer vision platform that simplifies the development
workflow for building and deploying computer vision models. Its features include dataset management,
annotation tools, powerful preprocessing capabilities, pre-built models, training pipeline management,
evaluation tools, and deployment options. Roboflow empowers developers to create accurate and robust
computer vision applications efficiently.

3.3. Block diagram

24
FIGURE 3.1: Flow of the web application

FIGURE 3.2: Diagram for Admin/User login Procedure

25
FIGURE 3.3: Working of Disease Prediction

FIGURE 3.4: Working of symptoms based disease prediction

3.4. Algorithms:

Heart Disease Prediction:


⮚ Import the necessary libraries.
⮚ Read the dataset from a CSV file into a pandas DataFrame.
⮚ Perform exploratory data analysis on the dataset (e.g., info, describe, correlation heatmap, histograms,
etc.).
⮚ Split the dataset into feature variables (X) and the target variable (y).
⮚ Define a class to evaluate and compare multiple machine learning models:
o Initialize the class by taking feature variables (X) and the target variable (y) as input.
o Split the data into training and test sets using train_test_split.
o Apply standard scaling to the training and test sets using StandardScaler.
o Define a dictionary of machine learning models and their names.
26
o For each model:
o Fit the model to the training data.
o Calculate the training and test scores.
o Print evaluation metrics (confusion matrix, accuracy, precision, recall, F1 score, specificity).
o Track the best model based on accuracy.
o Track the model that takes the least time for training and testing.
⮚ Provide methods to plot the training and test scores, get the evaluation data as a dictionary, and get
the evaluation data as a pandas DataFrame.
⮚ Print the best model based on accuracy.
⮚ 6. Create an instance of the evaluation class with the feature variables (X) and target variable (y).
⮚ Call the appropriate methods to visualize the results and get the evaluation data.
⮚ Perform logistic regression on the dataset.
⮚ Save the trained model and standard scaler using pickle.
⮚ Load the saved model and scaler.
⮚ Calculate the score of the loaded model on the test data.

Liver Disease Prediction:


⮚ Import the required libraries: pandas, numpy, matplotlib.pyplot, seaborn, train_test_split,
confusion_matrix, accuracy_score, StandardScaler, XGBClassifier, time, and pickle.
⮚ Load the dataset using `read_csv`.
⮚ Preprocess the data by replacing categorical values, handling missing values, and renaming columns.
⮚ Visualize the data using correlation matrix and pair plots.
⮚ Define a class `evaluate_all_model`:
o Initialize with input variables x, y, and given_state.
o Split data into training and testing sets, and apply feature scaling.
o Define models and evaluation metrics.
o Evaluate each model:
o Fit the model, calculate scores, and metrics.
o Update the best model based on accuracy.
o Plot bar chart of scores.
o Retrieve evaluation data in dictionary or dataframe format.
⮚ Create an instance of `evaluate_all_model` with x, y, and given_state.
⮚ Retrieve evaluation results in dataframe format.
⮚ Perform oversampling using SMOTE to address class imbalance.
⮚ Create another instance of `evaluate_all_model` with updated x and y.
⮚ Perform hyperparameter tuning for RandomForestClassifier.
⮚ Create a final instance of RandomForestClassifier with best hyperparameters.
⮚ Split data into training and testing sets.
⮚ Fit final model to training data and calculate test score.
⮚ Save the trained model using pickle.
⮚ Perform additional operations, such as accessing column names and calculating maximum values.

Diabetes Disease Prediction:

27
⮚ Import the required libraries: pandas, numpy, matplotlib.pyplot, seaborn, scikit-learn (sklearn).
⮚ Load the "Diabetes.csv" dataset using pandas into a DataFrame called "data."
⮚ Explore the correlation between features using a heatmap.
⮚ Define a class called `evaluate_all_model` to evaluate multiple classification models:
oImport the necessary modules and define the required models.
oInitialize attributes for storing model scores, time taken, and the best model.
oImplement methods for splitting data into training and test sets, defining models, and evaluating
models.
o Compute various evaluation metrics, including accuracy, precision, recall, F1 score, and specificity.
oKeep track of the best model based on accuracy and the model with the shortest training and testing
time.
oPlot the training and test scores using a bar plot.
oRetrieve the model evaluation data as a dictionary or DataFrame.
⮚ Create instances of the `evaluate_all_model` class, passing the input features and target variable.
⮚ Plot the bar chart of model scores.
⮚ Split the data into training and test sets using the `train_test_split()` method from sklearn.
⮚ Perform feature scaling on the input features if required using sklearn's `StandardScaler`.
⮚ Create an instance of the Gradient boosting classifier model with specified parameters.
⮚ Fit the model to the training data.
⮚ Calculate the accuracy score of the model on the test data.

Breast Cancer Prediction:

⮚ Importing the necessary libraries.


⮚ Reading a CSV file named "breast cancer.csv" into a pandas DataFrame called `data`.
⮚ Preprocessing the data by dropping unnecessary columns, renaming the target column, and replacing the
target labels with numerical values.
⮚ Visualizing the data using various plots, including a count plot, correlation matrix heatmap, and
histograms.
⮚ Splitting the data into training and testing sets using a 70:30 ratio.
⮚ Defining a class called `evaluate_all_model` that evaluates multiple machine learning models on the
breast cancer dataset.
⮚ The class performs the following steps:
o Splits the data into training and testing sets and applies feature scaling using StandardScaler.
o Defines a dictionary of machine learning models to be evaluated.
o Evaluates each model by fitting it to the training data, calculating training and testing scores, and
generating a confusion matrix.
o Stores the scores, time taken, and the best performing model based on accuracy.
o Provides methods to plot the scores and return the evaluation results as a DataFrame.
⮚ Creates an instance of the `evaluate_all_model` class and displays the evaluation results.
⮚ Trains a Logistic Regression model on the entire dataset, evaluates its performance, and saves the
trained model and StandardScaler using pickle.

Stroke Prediction:
28
⮚ Import the required libraries.
⮚ Suppress warnings.
⮚ Read and preprocess the data.
⮚ Visualize the data.
⮚ Encode categorical columns.
⮚ Handle missing values.
⮚ Split the data into features and target variables.
⮚ Preprocess the features.
⮚ Define a class to evaluate models.
⮚ Initialize the class with features and target.
⮚ Perform cross-validation for model evaluation.
⮚ Define and evaluate multiple models.
⮚ Select the best model based on performance.
⮚ Display the best model and its evaluation metrics.
⮚ Plot the training and test scores for all models.

Pneumonia Disease Prediction:

⮚ Set up the file paths for your training, validation, and test datasets.
⮚ Load and explore your dataset:
o Use suitable functions to list the files in your dataset directories.
o Identify the classes present in your dataset (e.g., normal and pneumonia).
o Print information about the datasets.
⮚ Apply data augmentation techniques:
o Use appropriate functions to perform data augmentation, such as rotation, shifting, shearing, and
zooming.
o Create data generators for your training, validation, and test sets.
⮚ Calculate the class weights:
o Determine the class weights based on the number of samples in each class.
o Store the class weights in a dictionary.
⮚ Choose an appropriate model architecture for your task.
⮚ Build the model:
o Add convolutional layers with batch normalization, pooling, and activation functions.
o Flatten the output and add fully connected layers.
o Include a final output layer with a suitable activation function for your problem.
⮚ Specify the loss function, optimizer, and evaluation metrics for training the model.
⮚ Train the model:
o Use suitable functions to train the model on the training data.
o Set the number of epochs, validation data, class weights, and other relevant parameters.
⮚ Evaluate the model:
o Measure the model's performance on the test set using appropriate evaluation metrics.
o Print the evaluation metrics, such as accuracy, and any other relevant metrics.

29
⮚ Use the trained model to predict the classes for the test set.
⮚ Perform any necessary post-processing or analysis on the predictions.

Disease Prediction:

⮚ Load and preprocess the training and testing datasets.


⮚ Prepare the input features (`x`) and target variable (`y`) for training the classifier.
⮚ Encode the target variable (`y`) using LabelEncoder to convert categorical labels into numerical values.
⮚ Split the data into training and testing sets using `train_test_split`.
⮚ Initialize and train a decision tree classifier (`clf`) on the training data.
⮚ Calculate feature importances to determine the most significant symptoms.
⮚ Load additional data such as symptom descriptions and severity.
⮚ Define functions to retrieve symptom descriptions, severity, and precautions from the respective
datasets.
⮚ Implement a function to calculate the severity of symptoms based on user input.
⮚ Implement a function to predict diseases based on symptoms using the trained classifier.
⮚ Implement a function to print the predicted disease(s) based on the decision tree's output.
⮚ Implement a function to predict secondary diseases using a separate decision tree classifier.
⮚ Implement a function to check patterns in a list of diseases based on user input.
⮚ Implement a function to recommend consultation or precautions based on symptom severity.
⮚ Implement a function to retrieve a list of diseases based on user input.
⮚ Initialize global variables and call the necessary functions to load data and dictionaries.
⮚ Implement a recursive function (`get_disease`) to traverse the decision tree and determine the disease
based on symptoms.

Medicine Recommendation:

⮚ Define a function `recommend(temp)` that takes an input `temp` representing the user's input or query.

⮚ Load or define the necessary data for the recommendation process. This may include a list of choices, a
dataset of medicines with their attributes, and a matrix or similarity scores between medicines.

⮚ Use a fuzzy matching algorithm (such as `process.extractOne` from the `fuzzywuzzy` library) to find the
most similar medicine to the input `temp`. Store this medicine in a variable called `medicine`.

⮚ Find the index of the `medicine` in the medicines dataset. This will be used to retrieve the corresponding
row or record.

⮚ Retrieve the similarity scores or distances between the `medicine` and all other medicines. This can be
done using the similarity matrix or data structure.

⮚ Sort the list of similarity scores in descending order and select the top 5 medicines (excluding the most
similar medicine itself).

30
⮚ Create an empty list called `recommended_medicines` to store the names of the recommended
medicines.

⮚ Iterate over the list of top 5 medicines and retrieve their corresponding names from the medicines
dataset. Append these names to the `recommended_medicines` list.

⮚ Return a tuple containing the `medicine` (the most similar one) and the `recommended_medicines` list.

31
CHAPTER 4: RESULT AND DISCUSSION

4.1. Implementation details and Issues


⮚ We have compiled a unique dataset to develop a recommendation system for suggesting medicine, diet,
and exercise plans. To have correct information, we gathered data from many sources.

⮚ During the data collection, we faced challenges in obtaining correct findings for some cases. Similarly,
when it came to the exercise part, we faced difficulties in obtaining crorect information.

⮚ One of the limitations we encountered while working on this project was the limited GPU power
available. As a result, the execution time for the convolutional neural network (CNN) models was
considerably extended. This delay in processing posed a challenge in completing the execution of the
models efficiently.

4.2. Evaluation Parameters


We evaluated all the models on the evaluation parameters:
⮚ Accuracy :
Accuracy is a widely used metric for evaluating classification models, which assesses the model's ability to
make correct predictions. In the context of machine learning, classification refers to the task of assigning
predefined labels or categories to input data based on patterns and features extracted from the data.
Accuracy, in simple terms, measures the proportion of correct predictions made by a classification model.

To understand accuracy, let's consider an example. Suppose we have a dataset consisting of various images
of fruits, and our task is to build a model that can classify these images into different categories such as
apples, oranges, and bananas. We train our model using a portion of the dataset, known as the training set,
and then evaluate its performance on a separate portion called the test set.

During evaluation, the model makes predictions on the test set by assigning a label (e.g., apple, orange, or
banana) to each image based on the learned patterns and features. Accuracy is calculated by comparing
these predicted labels to the true labels in the test set. If the predicted labels match the true labels, it is
considered a correct prediction. The accuracy is then calculated as the ratio of correct predictions to the
total number of predictions.

Mathematically, accuracy is defined as follows:

Accuracy = (Number of correct predictions)


(Total number of predictions)

For example, let's say our model predicted the labels for 100 images in the test set, and it correctly
classified 85 of them. In this case, the accuracy would be:

Accuracy = 85 = 0.85 or 85%


32
100

An accuracy of 85% indicates that our model accurately predicted the fruit category for 85% of the images
in the testset.

Accuracy is a straightforward and intuitive metric that provides an overall measure of a classification
model's performance. However, it has some limitations. Accuracy alone may not provide a complete
picture of the model's effectiveness, especially when dealing with imbalanced datasets, where the number
of instances in different classes is significantly different. In such cases, a high accuracy score can be
misleading, as the model might be performing well on the majority class but poorly on the minority class.

Therefore, it is essential to consider other evaluation metrics and examine the confusion matrix, precision,
recall, and F1 score to gain a comprehensive understanding of a classification model's performance. These
additional metrics provide insights into different aspects of the model's predictions and help assess its
strengths and weaknesses in handling different classes within the dataset.

In summary, accuracy is a valuable metric for evaluating classification models as it measures the proportion
of correct predictions made by the model. While it provides a useful overview of the model's performance,
it is important to consider other evaluation metrics to gain a more comprehensive understanding of its
effectiveness, particularly in scenarios involving imbalanced datasets.

⮚ Precision:
Precision is a fundamental metric used to evaluate the performance of classification models, particularly in
scenarios where correctly identifying positive instances is crucial. It quantifies the model's ability to make
accurate positive predictions and is calculated as the ratio of true positives to the total number of positive
predictions.

To comprehend precision, let's delve into its definition and calculation. In a binary classification problem,
where we have two classes, typically referred to as positive and negative, precision focuses solely on the
positive class. It measures the proportion of correctly predicted positive instances out of all the instances
predicted as positive by the model.

Mathematically, precision is defined as follows:

Precision = True Positives


(True Positives + False Positives)

True positives (TP) refer to the number of instances correctly identified as positive by the model, while
false positives (FP) represent the number of instances incorrectly classified as positive when they were
actually negative. In other words, true positives are the instances the model correctly labels as positive, and
false positives are the instances the model mistakenly labels as positive.

To illustrate precision, let's consider an example. Suppose we have a classification model that predicts
33
whether an email is spam or not. In the evaluation phase, the model predicts 100 emails as spam, out of
which 85 are indeed spam (true positives), while 15 are legitimate emails misclassified as spam (false
positives). In this scenario, the precision would be calculated as:

Precision = 85 = 85 / 100 = 0.85 or 85%


(85 + 15)

An 85% precision indicates that among all the emails predicted as spam by the model, 85% of them are
actually spam.

Precision is a crucial metric in situations where false positives have significant consequences or high costs.
For example, in medical diagnostics, correctly identifying positive cases (e.g., detecting a disease) is
essential to avoid false alarms or unnecessary treatments. In such cases, precision provides a measure of the
model's accuracy in correctly classifying positive instances and minimizing false positives.

However, it's important to note that precision alone might not provide a complete assessment of the model's
performance. Precision does not consider instances that were classified as negative but were actually
positive (false negatives). To gain a comprehensive understanding, precision should be analyzed in
conjunction with other evaluation metrics such as recall, F1 score, and the confusion matrix.

In summary, precision measures the accuracy of positive predictions made by a classification model. It
quantifies the proportion of true positives among all the instances predicted as positive. Precision is
particularly useful when false positives are costly or have significant consequences. However, it should be
considered alongside other metrics to obtain a more holistic evaluation of the model's performance.

⮚ Recall :
Recall, also known as sensitivity or true positive rate, is a fundamental metric used in classification models
to assess their ability to correctly identify positive instances. Its calculation involves determining the
proportion of true positives identified by the model out of all the actual positive instances present in the
dataset.

In the context of binary classification, recall focuses on the positive class and measures the model's
capacity to capture all positive instances without missing any. It essentially evaluates the completeness of
the model's predictions regarding the positive class.

Mathematically, recall can be defined as follows:

Recall = True Positives


(True Positives + False Negatives)

True positives (TP) represent the instances correctly identified as positive by the model, while false
negatives (FN) indicate the instances that were mistakenly classified as negative when they were actually
positive. True positives are the instances the model accurately labels as positive, whereas false negatives
are the instances the model erroneously labels as negative.

34
For a more concrete illustration of recall, let's consider an example. Imagine a model designed to predict
whether an email is spam or not. During the evaluation, the model encounters 100 actual spam emails, of
which it correctly identifies 85 as spam (true positives) but fails to detect 15 spam emails (false negatives).
In this scenario, the recall can be calculated as follows:

Recall = 85 = 85 / 100 = 0.85 or 85%


(85 + 15)

An 85% recall indicates that the model successfully captured 85% of the actual spam emails present in the
dataset.

Recall holds particular significance in situations where missing positive instances (false negatives) can
have serious consequences. In medical diagnostics, for instance, accurately detecting positive cases (e.g.,
identifying a disease) is crucial for timely treatment or intervention. By assessing recall, one can evaluate
the model's ability to correctly identify positive instances, thereby reducing false negatives and maximizing
the completeness of predictions.

However, it's worth noting that recall alone may not provide a comprehensive evaluation of the model's
performance. It does not consider instances that were classified as positive but were actually negative (false
positives). For a more holistic understanding, recall should be analyzed alongside other evaluation metrics
such as precision, the F1 score, and the confusion matrix.

In summary, recall quantifies the ability of a classification model to capture all the actual positive instances
in the dataset. It represents the proportion of true positives out of all the positive instances present. Recall
proves particularly useful in scenarios where missing positive instances can have significant consequences.
Nevertheless, it should be considered in conjunction with other metrics to obtain a comprehensive
evaluation of the model's performance.

⮚ F1 score :
The F1 score, also known as the F Score or the F Measure, is a metric used in classification models to
assess the balance between precision and recall. It provides a single value that combines the two metrics
into a comprehensive evaluation of the model's performance.

Precision and recall are two fundamental evaluation metrics in binary classification models. Precision
measures the accuracy of the positive predictions made by the model, while recall measures the model's
ability to capture all the actual positive instances. However, these metrics can sometimes be in conflict with
each other. For example, increasing precision may result in a decrease in recall, and vice versa.

The F1 score addresses this trade-off by taking into account both precision and recall. It calculates the
harmonic mean of precision and recall to provide a balanced measure of the model's performance. The
harmonic mean is used instead of the arithmetic mean to ensure that the F1 score is more sensitive to low
values of precision or recall.

Mathematically, the F1 score can be calculated using the following formula:

35
F1 Score = 2 * ((Precision * Recall)
(Precision + Recall))

The F1 score ranges from 0 to 1, with a higher value indicating better performance. A perfect F1 score of 1
means that the model has achieved both high precision and high recall simultaneously.

By considering both precision and recall, the F1 score provides a comprehensive evaluation of the model's
ability to balance the trade-off between correctly identifying positive instances and minimizing false
positives or false negatives.

For example, let's suppose we have a classification model that predicts whether a patient has a certain
disease or not. Precision would measure the proportion of correctly predicted positive cases out of all the
predicted positive cases. Recall would measure the proportion of correctly predicted positive cases out of
all the actual positive cases in the dataset. The F1 score combines these two metrics to provide a single
measure of the model's performance, considering both the accuracy of positive predictions and the
completeness of capturing positive cases.

It's important to note that the F1 score is most useful when there is an imbalance between the positive and
negative instances in the dataset. In such cases, accuracy alone may not be an appropriate evaluation
metric, as a model that always predicts the majority class (negative) could achieve a high accuracy while
performing poorly on the positive class. The F1 score takes into account this class imbalance and provides a
more balanced assessment of the model's performance.

In summary, the F1 score is a metric that combines precision and recall to provide a balanced evaluation of
a classification model's performance. It is particularly useful in situations where there is a class imbalance.
By considering both precision and recall, the F1 score offers insights into the model's ability to achieve
accuracy in positive predictions while capturing all the actual positive instances.

⮚ Time required:
The time required to execute a model refers to the amount of time it takes for the model to process and
provide predictions or results. It is an important aspect to consider when evaluating the efficiency and
practicality of a model in real-world scenarios.

The time required for model execution can vary depending on several factors:

1. Model Complexity: The complexity of the model architecture and the algorithms used can significantly
impact the execution time. Models with a large number of parameters or complex computations may
require more time to process data and generate predictions

2. Dataset Size: The size of the dataset on which the model operates can influence the execution time.
Larger datasets may require more time to process, especially if the model needs to perform computations
on a significant amount of data.

3. Hardware Infrastructure: The hardware on which the model is running plays a crucial role in determining
36
the execution time. High-performance hardware, such as GPUs (Graphics Processing Units) or specialized
hardware accelerators, can significantly speed up the model's execution compared to using regular CPUs
(Central Processing Units).

4. Optimization Techniques: Various optimization techniques can be applied to reduce the execution time
of the model. These techniques include model compression, quantization, or parallelization, which aim to
make the model more efficient and reduce computational requirements.

5. Implementation Efficiency: The efficiency of the implementation code can impact the execution time.
Well-optimized code and efficient algorithms can help reduce unnecessary computations and improve the
overall speed of the model.

It is important to note that the time required for model execution is not an evaluation metric used to assess
the model's performance or accuracy. Instead, it focuses on the practicality and efficiency of using the
model in real-world applications. In some cases, faster execution times may be desirable, especially in
time-sensitive applications where quick predictions are required

When evaluating the time required for model execution, it is essential to strike a balance between execution
speed and model accuracy. Sometimes, models with longer execution times may provide better
performance or higher accuracy compared to faster models. Therefore, it is crucial to consider the specific
requirements of the application and choose a model that meets both the time constraints and the desired
performance levels.

In summary, the time required for model execution refers to the duration it takes for the model to process
data and provide predictions. Factors such as model complexity, dataset size, hardware infrastructure,
optimization techniques, and implementation efficiency can influence the execution time. While faster
execution times can be advantageous in time-sensitive applications, it is important to consider the trade-off
between execution speed and model accuracy to choose the most suitable model for a given application.

37
4.3. Results:

A)Logistic Regression

Evaluation Heart Liver Disease Diabetes Stroke Breast


Parameters Disease Cancer
Training 0.80295 0.73846 0.7899 0.95033 0.98950
score
Test score 0.9 0.73056 0.7677 0.95317 0.98404
Confusion [[40 4] [[129 40] [[131 42] [[1608 79] [[121 2]
matrix [ 6 50]] [ 12 12]] [17 64]] [ 0 0]] [ 1 64]]

Accuracy 0.9 0.73056 0.7677 0.95317 0.98404


Precision 0.92592 0.23076 0.6038 0.0 0.96969
Recall 0.89285 0.5 0.7901 Nan 0.98461
F1 score 0.90909 0.31578 0.6845 Nan 0.97709
Specificity 0.90909 0.76331 0.7572 0.95317 0.98373
Time 0.01994 0.04154 0.897 seconds 0.02609 0.03567
required seconds seconds seconds

TABLE 4.1: Result of Logistic Regression on disease datasets

B)Random forest classifier

Evaluation Heart Disease Liver Disease Diabetes Stroke Breast Cancer


Parameters
Training 1.0 1.0 1.0 1.0 1.0
score
Test score 0.87 0.70984 0.8780 0.95139 0.97872
Confusion [[39 6] [[115 30] [[136 19] [[1605 79] [[121 3]
matrix [ 7 48]] [ 26 22]] [12 87]] [ 3 0]] [ 1 63]]

Accuracy 0.87 0.70984 0.8780 0.95139 0.97872


Precision 0.88888 0.42307 0.8208 0.0 0.95454
Recall 0.87272 0.45833 0.8788 0.0 0.98437
F1 score 0.88073 0.43999 0.8488 Nan 0.96923
Specificity 0.86666 0.79310 0.8774 0.95308 0.97580
Time 0.41450 0.64461 0.2533 seconds 0.46861 0.55657
required seconds seconds seconds seconds

TABLE 4.2: Result of Random forest classifier on disease datasets


C)Knn classifier

38
Evaluation Heart Disease Liver Disease Diabetes Stroke Breast Cancer
Parameters
Training 0.85714 0.78205 0.8599 0.95354 0.98162
score
Test score 0.88 0.69430 0.8071 0.95080 0.95744
Confusion [[43 9] [[117 35] [[127 28] [[1604 79] [[119 5]
matrix [ 3 45]] [ 24 17]] [21 78]] [ 4 0]] [ 3 61]]

Accuracy 0.88 0.69430 0.8071 0.95080 0.95744


Precision 0.83333 0.32692 0.7358 0.0 0.92424
Recall 0.9375 0.41463 0.7879 0.0 0.95312
F1 score 0.88235 0.36559 0.7610 Nan 0.93846
Specificity 0.82692 0.76973 0.8194 0.95306 0.95967
Time 0.03473 0.05939 0.406 seconds 0.40953 0.46738
required seconds seconds seconds seconds

TABLE 4.3: Result of Knn classifier on disease datasets

D) Decision tree classifier

Evaluation Heart Disease Liver Disease Diabetes Stroke Breast Cancer


Parameters
Training 1.0 1.0 1.0 1.0 1.0
score
Test score 0.68 0.68393 0.8189 0.91049 0.93085
Confusion [[34 20] [[109 29] [[131 29] [[1522 65] [[115 6]
matrix [12 34]] [ 32 23]] [17 77]] [ 86 14]] [ 7 60]]
Accuracy 0.68 0.68393 0.8189 0.91049 0.93085
Precision 0.62962 0.44230 0.7264 0.17721 0.90909
Recall 0.73913 0.41818 0.8191 0.14 0.89552
F1 score 0.68 0.42990 0.7700 0.15642 0.90225
Specificity 0.62962 0.78985 0.8188 0.95904 0.95041
Time 0.00705 0.00845 0.50 seconds 0.01642 0.03265
required seconds seconds seconds seconds

TABLE 4.4: Result of Decision tree classifier on disease datasets

E)Support vector machine

39
Evaluation Heart Disease Liver Disease Diabetes Stroke Breast Cancer
Parameters
Training 0.90640 0.71538 0.8969 0.95121 0.98425
score
Test score 0.86 0.72538 0.8228 0.95317 0.98404
Confusion [[40 8] [[140 52] [[129 26] [[1608 79] [[121 2]
matrix [ 6 46]] [ 1 0]] [19 80]] [ 0 0]] [ 1 64]]

Accuracy 0.86 0.72538 0.8228 0.9531 0.98404


Precision 0.85185 0.0 0.7547 0.0 0.96969
Recall 0.88461 0.0 0.8081 0.0 0.98461
F1 score 0.86792 Nan 0.7805 Nan 0.97709
Specificity 0.83333 0.72916 0.8323 0.95317 0.98373
Time 0.02066 0.05120 0.370 seconds 0.38544 0.04567
required seconds seconds seconds seconds

TABLE 4.5: Result of Support vector machine on disease datasets

F)Gaussian NB

Evaluation Heart Disease Liver Disease Diabetes Stroke Breast Cancer


Parameters
Training 0.81773 0.57692 0.7763 0.86882 0.92650
score
Test score 0.9 0.48704 0.7480 0.86010 0.93085
Confusion [[41 5] [[45 3] [[123 39] [[1413 41] [[113 4]
matrix [ 5 49]] [96 49]] [25 67]] [ 195 38]] [ 9 62]]

Accuracy 0.9 0.48704 0.7480 0.86010 0.93085


Precision 0.90740 0.94230 0.6321 0.48101 0.93939
Recall 0.90740 0.33793 0.7283 0.16309 0.87323
F1 score 0.90740 0.49746 0.6768 0.24358 0.90510
Specificity 0.89130 0.9375 0.7593 0.97180 0.96581
Time 0.02006 0.00352 0.0030 seconds 0.01135 0.01562
required seconds seconds seconds seconds

TABLE 4.6: Result of Gaussian NB on disease datasets

G)Bernoulli NB
40
Evaluation Heart Disease Liver Disease Diabetes Stroke Breast Cancer
Parameters
Training 0.82758 0.66666 0.8463 0.94449 0.92388
score
Test score 0.89 0.63212 0.8622 0.94487 0.96276
Confusion [[40 5] [[84 14] [[129 16] [[1589 74] [[118 3]
matrix [ 6 49]] [57 38]] [19 90]] [ 19 5]] [ 4 63]]

Accuracy 0.89 0.63212 0.8622 0.94487 0.96276


Precision 0.90740 0.73076 0.8491 0.06329 0.95454
Recall 0.89090 0.4 0.8257 0.20833 0.94029
F1 score 0.89908 0.51700 0.8372 0.09708 0.94736
Specificity 0.88888 0.85714 0.8897 0.95550 0.97520
Time 0.01934 0.00299 0.31 seconds 0.00991 0.01704
required seconds seconds seconds seconds

TABLE 4.7: Result of Bernoulli NB on disease datasets

H)Gradient boosting classifier

Evaluation Heart Disease Diabetes Stroke Breast Cancer


Parameters
Training 1.0 0.9961 0.96026 1.0
score
Test score 0.83 0.8898 0.950207468879668 0.97872
Confusion [[40 11] [[137 17] [[1602 78] [[122 4]
matrix [ 6 43]] [11 89]] [ 6 1]] [ 0 62]]

Accuracy 0.83 0.8898 0.950207468879668 0.97872


Precision 0.79629 0.8396 0.012658227848101266 0.93939
Recall 0.87755 0.8900 0.14285714285714285 1.0
F1 score 0.83495 0.8641 0.023255813953488372 0.96875
Specificity 0.78431 0.8896 0.95357 0.96825
Time 0.22320 0.1917 seconds 0.34865427017211914 1.33526
required seconds Seconds

TABLE 4.8: Result of Gradient boosting classifier on disease datasets

I)CNN model
41
Evaluation Pneumonia Disease
Parameters
Accuracy 84.74%
Precision 77.47%(for class 0), 89.76%(for class 1)
Recall 83.76%(for class 0), 85.38%(for class 1)
F1 score 80.49%(for class 0), 87.52%(for class 1)

TABLE 4.9: Result of Cnn model on pneumonia disease dataset

J)Vgg16 model

Evaluation Pneumonia Disease


Parameters
Test 84.74%
accuracy
Train 25.88%
accuracy
Precision 0%(for class 0), 100%(for class 1)
Recall 0%(for class 0), 0.23%(for class 1)

TABLE 4.10: Result of Vgg16 model on pneumonia disease datasets


1)On the basis of the above result of each algorithm. We can say that Logistic Regression is the best
algorithm for Heart disease Prediction. As it is the algorithm with the highest accuracy of 0.9. And also has
a good balance between its precision(0.92592) and recall(0.89285) values. Moreover, the time required for
its testing and training is low as compared to other algorithms.

2)On the basis of the above result of each algorithm. We can say that Logistic Regression is the best
algorithm for Liver disease Prediction. As it is the algorithm with the highest accuracy of 0.73846.
Moreover, the time required for its testing and training is low as compared to other algorithms.

3)On the basis of the above result of each algorithm. We can say that Gradient boosting classifier is the best
algorithm for Diabetes disease Prediction. As it is the algorithm with the highest accuracy of 0.8898. And
also has a good balance between its precision(0.8396) and recall(0.8900) values.

4) On the basis of the above result of each algorithm. We can say that Logistic Regression is the best
algorithm for Breast Cancer Prediction. As it is the algorithm with the highest accuracy of 0.98404. And
also has a good balance between its precision(0.96969) and recall(0.98461) values. Moreover, the time
required for its testing and training is low as compared to other algorithms.
5)On the basis of the above result of each algorithm. We can say that Logistic Regression is the best
algorithm for Stroke Prediction. As it is the algorithm with the highest accuracy of 0.95033.
6)On the basis of the above result of the models. We can see that the CNN model(84.78) has better
accuracy then Vgg16 model(37.50). Moreover CNN model has better performance metrics than the Vgg16
model.

42
CHAPTER 5: CONCLUSION AND FUTURE WORKS

5.1. Findings

⮚ Through the literature Survey we found that the models already build, out of them most were built for
the prediction of one disease. And only a few were there for more than one.

⮚ When come to the part recommending medicine, diet and exercise there were only a few recommending
them. And that also only for one disease or a few. With recommending only one of them.

⮚ When the data is linear, logistic regression was found to be the best option for prediction.

⮚ There is insufficient and no proper dataset for diseases along with their symptoms, severity, description,
medicine, diet, exercise.

5.2. Future Work


In future, we can add more data to our current dataset to cover as many diseases we can and also to
recommend more medicines, diet and exercise. Recommending the patient, the names of the hospital and
clinics in their zone. Also telling them to go to which doctor after getting predicted for a particular disease.
Also surveying the patient post treatment.

43
References
[1] Sharma, V. and Singh Samant, S. (2022) Health recommendation system by using Deep Learning and
fuzzy technique, SSRN. Available at: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4157328
(Accessed: November 19, 2022)

[2] https://iopscience.iop.org/article/10.1088/1757-899X/11
Sharma, V. and Singh Samant, S. (2022) Health recommendation system by using Deep Learning and
fuzzy technique, SSRN. Available at: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4157328
(Accessed: November 19, 2022).

[3] Kim, J.-H. et al. (no date) Design of diet recommendation system for healthcare service based on User
Information, IEEE Xplore. Available at: https://ieeexplore.ieee.org/abstract/document/5367898 (Accessed:
November 19, 2022).

[4] Husain, W. et al. (no date) Application of data mining techniques in a personalized diet recommendation
system for cancer patients, IEEE Xplore. Available at:
https://ieeexplore.ieee.org/abstract/document/6163724 (Accessed: November 19, 2022)
.
[5] Maurya, A. et al. (no date) Chronic kidney disease prediction and recommendation of suitable diet plan
by using machine learning, IEEE Xplore. Available at:
https://ieeexplore.ieee.org/abstract/document/8946029 (Accessed: November 19, 2022).

[6] https://www.irjet.net/archives/V9/i7/IRJET-V9I7205.pdf
da Silva, B.A. and Krishnamurthy, M. (2016) The alarming reality of medication error: A patient case and
review of Pennsylvania and National Data, Journal of community hospital internal medicine perspectives.
U.S. National Library of Medicine. Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5016741/
(Accessed: November 19, 2022).

[7] https://www.interscience.in/cgi/viewcontent.cgi?article=1109&context=gret

[8] https://www.ijstr.org/final-print/feb2020/Medicine-Recommendation-System-Based-On-Patient-
Reviews.pdf

[9]Sood, G. and Raheja, N. (2013) Performance evaluation of health recommendation system ... -
iopscience. Available at: https://iopscience.iop.org/article/10.1088/1757-899X/1131/1/012013 (Accessed:
19 November 2022).

44
45

You might also like