You are on page 1of 68

DISEASE PREDICTION USING MACHINE LEARNING

Submitted for partial fulfilment of the requirements

for the award of degree of

BACHELOR OF TECHNOLOGY

IN

INFORMATION TECHNOLOGY

Submitted By

K.RAKESH(17BQ1A1264)
K.SAI SRI HARSHA(17BQ1A1280)
N.PHALGUNA SAI(17BQ1A12B4)
K.MALLIKARJUNA(17BQ1A1263)
Under the supervision of
B. LAKSHMI PRAVEENA, M. TECH (PhD)
Associate professor, Dept of IT

VASIREDDY VENKATADRI INSTITUTE OF TECHNOLOGY

NAMBUR (V), PEDAKAKANI (M), GUNTUR-522 508

Jawaharlal Nehru Technological University, Kakinada, AP, India.

July 2021
VASIREDDY VENKATADRI INSTITUTE OF TECHNOLOGY: NAMBUR

BONAFIDE CERTIFICATE
Certified that this project report ”DISEASE PREDICTION USING MACHINE LEARNING ” is the
bonafide work of “ K.RAKESH (17BQ1A1264), K.SAI SRI HARSHA (17BQ1A1280),N.PHALGUNA
SAI (17BQ1A12B4) and ,K.MALLIKARJUNA (17BQ1A1263)”, who carried out the project under my
guidance during the year 2021 towards partial fulfilment of the requirements of the Degree of Bachelor
of Technology in Information Technology from Jawaharlal Nehru Technological University, Kakinada.
The results embodied in this report have not been submitted to any other University for the award of any
degree.

Signature of the Head of the Department Signature of the Supervisor


A. KALAVATHI Ph.D B.Lakshmi Praveena M. TECH(Ph.D.)
HEAD OF THE DEPARTMENT GUIDE
Department of Information Technology Department of Information Technology

21 June 2021

External Viva Conducted on

Internal Examiner External Examiner


VASIREDDY VENKATADRI INSTITUTE OF TECHNOLOGY:: NAMBUR

CERTIFICATE OF AUTHENTICATION
I solemnly declare that this project report “DISEASE PREDICTION USING MACHINE LEARNING”
is the bonafide work done purely by me/us, carried out under the supervision of Ms. B.Lakshmi
Praveena, towards partial fulfillment of the requirements of the Degree of BACHELOR-OF-TECHNOLOGY
in Information Technology from Jawaharlal Nehru Technological University, Kakinada during the year
2020-21.It is further certified that this work has not been submitted, either in part or in full, to any
other department of the Jawaharlal Nehru Technological University, or any other University, institution
or elsewhere, or for publication in any form.

Signature of the Student

K.Rakesh

K.Sai Sri Harsha

N.Phalguna Sai

K.Mallikarjuna
ACKNOWLEDGEMENT
We take this opportunity to express our deepest gratitude and appreciation to all those people who
made this project work easier with words of encouragement,motivation, discipline, and faith by offering
different places to look to expand my ideas and help me towards the successful completion of this
project work.First and foremost, we express our deep gratitude to Mr. Vasireddy VidyaSagar, Chairman,
Vasireddy Venkatadri Institute of Technology for providing necessary facilities throughout the
Information Technology program.We express our sincere thanks to Dr. Y. Mallikarjuna Reddy, Principal
,Vasireddy Venkatadri Institute of Technology for his constant support and cooperation throughout the
Information Technology program.We express our sincere gratitude to Dr.A.Kalavathi, Professor & HOD,
Information Technology, Vasireddy Venkatadri Institute of Technology for her constant encouragement,
motivation and faith by offering different places to look to expand my ideas. We would like to express
our sincere gratitude to our guide Ms.B.Lakshmi Praveena for her insightful advice, motivating
suggestions, invaluable guidance, help and support in successful completion of this project.We would
like to take this opportunity to express our thanks to the teaching and nonteaching staff in Department
of Information Technology, VVIT for their invaluable help and support.

K.RAKESH - 17BQ1A1264

K.SAI SRI HARSHA - 17BQ1A1280

N.PHALGUNA SAI - 17BQ1A12B4

K.MALLIKARJUNA -17BQ1A1263
ABSTRACT

Disease Prediction using Machine Learning is a system which predicts the disease based on the
symptoms the user enters into the system and provides the accurate results based on that
information. If the patient is not very serious and the user just wants to know the type of disease.
Now a day’s health industry plays major role in curing the diseases of the patients so this is also
some kind of help for the health industry to tell the user and also it is useful for the user in case
user doesn’t want to go to the hospital, so just by entering the symptoms and all other useful
information the user can get to know the disease user is suffering from and the health industry
can also get benefit from this system by just asking the symptoms from the user and entering in
the system and in just few seconds they can tell the exact and up to some extent the accurate
diseases. This Disease Prediction Using Machine Learning is completely done with the help of
Machine Learning and Python Programming language with Tkinter Interface for it and also using
the dataset that is available previously by the hospitals using that we will predict the disease.
Table of Contents
Title Page No
CHAPTER 1 INTRODUCTION 1
1.1 Introduction 1
1.2 Aim and Scope 1
1.3 Goals and Objectives 2
1.4 Existing System 2
1.5 Proposed System 3
1.6 Feasibility Study 4
1.6.1 Technical Feasibility 4
1.6.2 Operational Feasibility 4
CHAPTER 2 REQUIREMENT SPECIFICATIONS 5
2.1 Description of the Problem 5
2.2 Proposed Solution 6
2.3 System Requirements 6
2.3.1 Hardware Requirements 6
2.3.2 Software Requirements 7
2.3.3 Functional Requirements 7
2.3.4 Non Functional Requirements 8
2.3.5 Performance Requirements 9
2.3.6 Modules 10
2.4 System Analysis Method 11
2.4.1 Use Case Diagram 11
2.4.2 Activity Diagram 12
2.5 System Design Methods 13
2.5.1 Class Diagram 13
2.5.2 Sequence Diagram 14
CHAPTER 3 SYSTEM DESIGN 15
3.1 Introduction 15

3.2 System Architectural Design 15

3.2.1 Chosen System Architecture 16

3.2.2 System Interface Descriptions 16

CHAPTER 4 SYSTEM IMPLEMENTATION 17

4.1 Tools and Technologies Used 17

4.1.1 Technologies 17

4.1.2 Tools 30

4.2 Screenshots 32

CHAPTER 5 TESTING 35

5.1 Introduction 35

5.2 Testing Methods 35

5.3 Test Cases 37

CHAPTER 6 CONCLUSION 42

6.1 Conclusion 42

6.2 Future Enhancement 42

CHAPTER 7 APPENDICES 43

7.1 Code Snippet 43

7.2 Bibliography 57
LIST OF FIGURES
Fig no Figure Name Page no

1.Use Case Diagram 23

2.Activity Diagram 26

3.Class Diagram 27

4.Sequence Diagram 28

5.System Architecture 30
CHAPTER 1

INTRODUCTION

1.1Introduction

The Earth is passing through a purplish patch of technology, where there is increasing
demand for intelligence and accuracy behind it. Today’s people are more likely addicted to the
Internet but they are not concerned about their personal health. In this 21st Century humans are
surrounded with technology as they are the constituent of our day to day life cycle. With this we
are always focusing on the health for ourselves and our earned valuables respectively. People
avoid going to the hospital for small problems which may become a major disease in future.

Our basic idea is to develop a system which will predict the disease based on the
symptoms which are given as input by the user. The system will compare the symptoms with the
datasets provided in the database and apply algorithms to predict the disease. The main feature
will be the machine learning, in which we will be using algorithms such as Naïve Bayes
Algorithm, Decision Tree Algorithm, Random Forest Algorithm which will predict accurate
disease and Also, will find which algorithm gives a faster and efficient result by
comparatively-comparing the accuracy of the three algorithms which makes the prediction more
accurate.

1.2 Aim and Scope

The aim of our project is to predict the accurate disease of the patient using all symptoms
given by the user. So the problem here is that many people goes to hospitals or clinic to know
how is their health and how much they are improving in the given days, but they have to travel
to get to know there answers and sometimes the patients may or may not get the results based
on various factors such as doctor might be on leave or some whether problem so he might not
have come to the hospital and many more reasons will be there so to avoid all those reasons
and confusion we are making a project which will help all those person’s and all the patients
who are in need to know the condition of their health, and at sometimes if the person has been
observing few symptoms and he/she is not sure about the disease he/she is encountered with
so this will lead to various diseases in future. So, to avoid that and get to know the disease in
in the early stages of the symptoms, this disease prediction will help a lot to the various people
ranging from children to teenagers to adults and also the senior citizens.

1.3 Goals and Objectives

The purpose of making this project called “Disease Prediction Using Machine Learning “
is to predict the accurate disease of the patient using all their general information and also the
symptoms. Using this information, there we will compare with our previous datasets of the
patients and predict the disease of the patient he/she has been through. If this Prediction is
done at the early stages of the disease with the help of this project and all other necessary
measures the disease can be cured and in general this prediction system can also be very useful
in the health industry. If the health industry adopts this project then the work of the doctors can
be reduced and they can easily predict the disease of the patient. The general purpose of this
Disease prediction is to provide predictions for the various and generally occurring diseases that
when unchecked and sometimes ignored can turn into fatal disease and cause a lot of problems
to the patient and as well as their family members. This system will predict the most possible
disease based on the symptoms. The health industry is information yet knowledge poor and this
industry is a very vast industry which has a lot of work to be done. So, with the help of all those
algorithms, techniques and methodologies we have done this project which will help the people
who are in need.

1.4 Existing System

Prediction using traditional methods and models involves various risk factors and it
consists of various measures of algorithms such as datasets, programs and much more to add
on. High-risk and Low-risk patient classification is done on the basis of the tests that are done in
groups. But these models are only valuable in clinical situations and not in big industry sectors.
So, to include the disease predictions in various health related industries, we have used the
concepts of machine learning and supervised learning methods to build the predictions system.
After doing the research and comparison of all the algorithms and theorems of machine learning
we have come to conclusion that all those algorithms such as Decision Tree, Naïve Bayes and
Random Forest Algorithm all are important in building a disease prediction system which

predicts the disease of the patients from which he/she is suffering from and to do this we have
used some performance measures like ROC, KAPPA Statistics, RMSE, MEA and various other
tools. After using various techniques such as neural networks to make predictions of the
diseases and after doing that we come to the conclusion that it can predict up to 90% accuracy
rate after doing the experimentation and verifying the results. The information of patient
statistics, results, and disease history is recorded in EHR, which enables us to identify the
potential data centric solution, which reduces the cost of medical case studies. Existing system
can predict the disease but not the subtype of the disease and it fails to predict the condition of
the people, the predictions of disease have been indefinite and non-specific.

1.5 Proposed System

The proposed system of disease prediction using machine learning is that we have used
many techniques and algorithms and all other various tools to build a system which predicts the
disease of the patient using the symptoms and by taking those symptoms we are comparing
with the system’s dataset that is previously available. By taking those datasets and comparing
with the patient’s disease we will predict the accurate percentage disease of the patient. The
dataset and symptoms go to the prediction model of the system where the data is pre-processed
for the future references and then the feature selection is done by the user where he will enter
the various symptoms. Then the classification of those data is done with the help of various
algorithms and techniques such as Decision Tree, Naïve Bayes, Random Forest, etc. Then the
data goes in the recommendation model, there it shows the risk analysis that is involved in the
system and it also provides the probability estimation of the system such that it shows the
various probability like how the system behaves when there are n number of predictions are
done and it also does the recommendations for the patients from their final result and also from
their symptoms like it can show what to use and what not to use from the given datasets and
the final results. Here we have combined the overall structure and unstructured form of data for
the overall risk analysis that is required for doing the prediction of the disease. Using structured
analysis, we can identify the chronic types of disease in a particular region and particular

community. In unstructured analysis we select the features automatically with the help of
algorithms and techniques. This system takes symptoms from the user and predicts the disease
accordingly based on the symptoms that it takes and also from the previous datasets, it also
helps in continuous evaluation of viral diseases, heart rate,
blood pressure, sugar level and much more which is in the system and along with other external
symptoms its predicts the appropriate and accurate disease.

1.6 Feasibility Study


1.6.1 Technical Feasibility

The software code base features being able to reuse the existing code instead of creating
new code when moving software from an environment to another. Project can be executed
under different operation conditions provided it meets its minimum configurations. Only system
files and dependent assemblies would have to be configured in such a case.

1.6.2 Operational Feasibility

System is capable of handling increased total throughput under an increased load when
resources (typically hardware) are added. System can work normally under situations such as
low bandwidth and large number of users.
CHAPTER II
REQUIREMENT SPECIFICATIONS

2.1 Description of the Problem

Now a day’s in Health Industry there are various problems related to machines or
devices which will give wrong or unaccepted results, so to avoid those results and get the
correct and desired results we are building a program or project which will give the accurate
predictions based on information provided by the user and also based on the datasets that are
available in that machine. The health industry in information yet and knowledge poor and this
industry is very vast industry which has lot of work to be done. So, with the help of all those
algorithms, techniques and methodologies we have done this project which will help the
peoples who are in the need. So the problem here is that many people goes to hospitals or clinic
to know how is their health and how much they are improving in the given days, but they have
to travel to get to know there answers and sometimes the patients may or may not get the
results based on various factors such as doctor might be on leave or some whether problem so
he might not have come to the hospital and many more reasons will be there so to avoid all
those reasons and confusion we are making a project which will help all those person’s and all
the patients who are in need to know the condition of their health, and at sometimes if the
person has been observing few symptoms and he/she is not sure about the disease he/she is
encountered with so this will lead to various diseases in future. So, to avoid that and get to know
the disease in early stages of the symptoms this disease prediction will help a lot to the various
people’s ranging from children to teenagers to adults and also the senior citizens.
2.2 Proposed Solution

We have used many techniques and algorithms and all other various tools to build a
system which predicts the disease of the patient using the symptoms and by taking those
symptoms we are comparing with the system’s dataset that is previously available. By taking
those datasets and comparing with the patient’s disease we will predict the accurate percentage
disease of the patient. The dataset and symptoms go to the prediction model of the system
where the data is pre-processed for the future references and then the feature selection is done
by the user where he will enter the various symptoms. Then the classification of those data is
done with the help of various algorithms and techniques such as Decision Tree, Naïve Bayes,
Random Forest, etc.

This system takes symptoms from the user and predicts the disease accordingly based on
the symptoms that it takes and also from the previous datasets along with other external
symptoms it predicts the appropriate and accurate disease.

2.3 System Requirements

Computer Aided learning is a rapidly growing dynamic area of research in the machine learning
industry. The recent researchers in machine learning promise the improved accuracy of Disease
prediction using machine learning .Here the computers are enabled to think by developing
intelligence by learning. There are many types of Machine Learning Techniques which are used
to classify the data sets and analyze the result.

2.3.1 Hardware Requirements

The hardware requirement specifies each interface of the software elements and the hardware
elements of the system. These hardware requirements include configuration characteristics.
System : Pentium IV 2.4 GHz.

Hard Disk : 10 GB or more.


RAM : 1 GB
Input Devices : Keyboard and Mouse
Output Devices : Monitor

2.3.2 Software Requirements

The software requirements specify the use of all required software products like data
management systems. The required software product specifies the numbers and version. Each
interface specifies the purpose of the interfacing software as related to this software product.

Operating system : Windows XP/7/10


Platform : Visual Studio Ide
Front end : Tkinter
Back end : Python

2.3.3 Functional Requirements


A Functional requirement defines a function of a system or its component. A function is

described as a set of inputs, the behaviour, and outputs. Functional requirements may be
calculations, technical details, data manipulation and processing and other specific functionality
that define what a system is supposed to accomplish. Behavioural requirements describing all
cases where the system uses the functional requirements are captured in use cases. Functional
requirements are supported by non-functional requirements (also known as quality requirements),
which impose constraints on the design or implementation (such as performance requirements,
security, or reliability).
As defined in requirements engineering, functional requirements specify particular results of a
system. This should be contrasted with non-functional requirements which specify overall

characteristics such as cost and reliability. Functional requirements drive the application
architecture of a system, while non-functional requirements drive the technical architecture of a
system.

• Functional Requirements concerns the specific functions delivered by the system. So,
Functional requirements are statements of the services that the system must provide.

• The functional requirements of the system should be both complete and consistent

• Completeness means that all the services required by the user should be defined.

2.3.4 Non-functional Requirements:

• Non-functional Requirements refer to the constraints or restrictions on the system. They may
relate to emergent system properties such as reliability, response time and store occupancy or the
selection of language, platform, implementation techniques and tools.

• The non-functional requirements can be built on the basis of needs of the user, budget
constraints, organization policies, etc.

1. Performance requirement: All data entered shall be up to mark and no flaws shall be there
for the performance to be 100%.

2. Platform constraints: The main target is to generate an intelligent system to predict the
disease.

3. Accuracy and Precision: Requirements are accuracy and precision of the data

4. Modifiability: Requirements about the effort required to make changes in the software.
Often, the measurement is personnel effort (person- months).
5. Reliability: Requirements about how often the software fails. The definition of a failure must
be clear. Also, don't confuse reliability with availability which is quite a different kind of
requirement. Be sure to specify the consequences of software failure, how to protect from
failure, a strategy for error Prediction, and a strategy for correction.
6. Security: One or more requirements about protection of your system and its data.

7. Usability: Requirements about how difficult it will be to learn and operate the system. The
requirements are often expressed in learning time or similar metrics.

2.3.5 Performance Requirements

Performance is measured in terms of the output provided by the application. Requirement


specification plays an important part in the analysis of a system. Only when the requirement
specifications are properly given, it is possible to design a system, which will fit into the required
environment. It rests largely with the users of the existing system to give the requirement
specifications because they are the people who finally use the system. This is because the

requirements have to be known during the initial stages so that the system can be designed

according to those requirements. It is very difficult to change the system once it has been
designed and on the other hand designing a system, which does not cater to the requirements of
the user, is of no use. The requirement specification for any system can be broadly stated as given
below: The system should be able to interface with the existing system The system should be
accurate The system should be better than the existing system The existing system is completely
dependent on the user to perform all the duties
2.4 Modules

The modules in contagious disease prediction are:


1.User Module
2.Prediction Module

1.USER MODULE

A User Enters their details for registering themselves to the System. Input Details of Users
such as username, email, phone, age, password. If the user’s details are correct, the user is
registered. If the user’s details are incorrect, Displays an error message. If the user is already
registered, Displays an error message.
When the user tries to log in, details of the user are verified in the system If the login details
are correct, the user is logged in and the user page is displayed If the login details are
incorrect, Displays an error message.

2.PREDICTION MODULE

User needs to enter symptoms to get the prediction result If the user enters all 5 correct
symptoms then the accuracy will be high. If a user enters only a few symptoms then
accuracy will be low. When user enter all the symptoms then he needs to press the buttons of
respective algorithm, for example there are 3 buttons for 3 algorithms, if user enters all
symptoms and presses only Random forest’s button then the result will be provided only
calculating using that algorithm, like this we have used 3 algorithms to provide more clear
picture of the results and user needs to be satisfied with his predicted result.
2.4 System Analysis method

2.4.1 Use case Diagram


A Use case is a definition of a set of action sequences. Graphically it is made as a solid

line ellipse, with only its name included. Usage case diagram is a behavioral diagram showing a
series of cases and actors of use and their relation. The main actor in the contagious disease
prediction application is the user.
2.4.2 Activity Diagram
Another important diagram in UML for representing the dynamic aspects of the system is
the activity diagram. The activity diagram is essentially a flowchart for describing the flow from
one activity to another. The behaviour could be defined as machine operation. This draws the
control flow from one process to the next.
2.5 System Design Methods
2.5.1 Class Diagram
The Class Diagram describes the structure of system classes, their attributes, operations
and the relationships among the objects. It describes the attributes and operations of a class and
also the constraints imposed on the system.
2.5.2 Sequence Diagram
The Sequence Diagrams are interaction diagrams that detail how operations are carried
out. They capture the interaction between objects in the context of a collaboration. Sequence
diagrams and collaboration diagrams are called INTERACTION DIAGRAMS. An interaction
diagram represents an interaction, which consists of a series of objects and their relationship and
the messages that can be exchanged between them. An introduction to a sequence diagram
empathizes with the time ordering of messages. Graphically a sequence diagram is a table that
shows objects arranged along the X-axis and messages ordered in increasing time along the
Y-axis.
CHAPTER III
SYSTEM DESIGN
3.1 Introduction
System design is the process of defining the components, modules, interfaces, and data
for a system to satisfy specified requirements. System development is the process of creating or
altering systems, along with the processes, practices, models, and methodologies used to develop
them.

3.2 System Architectural Design

A system architecture is the conceptual model that defines the structure, behavior, and
more views of a system. A system architecture can consist of system components and the
sub-systems developed that will work together to implement the overall system.

3.2.1 Chosen System Architecture


3.2.2 System Interface Descriptions

A system interface is a logical interface that converts system input into its
output that displays on the user interface. The system interface is behind the user
interface and its processing will not be seen at the user end.

our project app.py python programs acts as a system interface and Tkiner is used
for front end
CHAPTER IV
SYSTEM IMPLEMENTATION
4.1 Tools and Technologies Used
4.1.1 Technologies:
PYTHON

Python is a multi-paradigm programming language. Object-oriented programming and


structured programming are fully supported, and many of its features support functional
programming and aspect-oriented programming (including by metaprogramming and
metaobjects. Many other paradigms are supported via extensions, including design by contract
and logic programming. Python uses dynamic typing and a combination of reference counting
and a cycle-detecting garbage collector for memory management. It also features dynamic name
resolution (late binding), which binds method and variable names during program execution.

Python's developers strive to avoid premature optimization, and reject patches to non
critical parts of CPython that would offer marginal increases in speed at the cost of clarity. When
speed is important, a Python programmer can move time-critical functions to extension modules
written in languages such as C, or use PyPy, a just-in-time compiler. Cython is also available,
which translates a Python script into C and makes direct C-level API calls into the Python
interpreter. An important goal of Python's developers is keeping it fun to use. Python's design
offers some support for functional programming in the Lisp tradition. It has filter, map, and
reduce functions, list comprehensions, dictionaries, sets, and generator expressions. The standard
library has two modules (itertools and functools) that implement functional tools borrowed from
Haskell and Standard ML.
BENEFITS OF PYTHON
• Presence of Third-Party Modules

• Extensive Support Libraries

• Open Source and Community Development

• Learning Ease and Support Available

• User-friendly Data Structures

• Productivity and Speed

• Highly Extensible and Easily Readable Language.

Installation
Install a Python interpreter

Along with the Python extension, you need to install a Python interpreter. Which interpreter you
use is dependent on your specific needs, but some guidance is provided below.
Windows

Install Python from python.org. You can typically use the Download Python button that appears
first on the page to download the latest version.
Note: If you don't have admin access, an additional option for installing Python on Windows is to
use the Microsoft Store. The Microsoft Store provides installs of Python 3.7, Python 3.8, and
Python 3.9. Be aware that you might have compatibility issues with some packages using this
method.
For additional information about using Python on Windows, see Using Python on Windows at
Python.org
macOS

The system install of Python on macOS is not supported. Instead, an installation through
Homebrew is recommended. To install Python using Homebrew on macOS use brew install
python3 at the Terminal prompt.

Note On macOS, make sure the location of your VS Code installation is included in your PATH
environment variable. See these setup instructions for more information.
Linux

The built-in Python 3 installation on Linux works well, but to install other Python packages you
must install pip with get-pip.py.
Other options
● Data Science: If your primary purpose for using Python is Data Science, then you might
consider a download from Anaconda. Anaconda provides not just a Python interpreter,
but many useful libraries and tools for data science.
● Windows Subsystem for Linux: If you are working on Windows and want a Linux
environment for working with Python, the Windows Subsystem for Linux (WSL) is an
option for you. If you choose this option, you'll also want to install the Remote - WSL
extension. For more information about using WSL with VS Code, see VS Code Remote
Development or try the Working in WSL tutorial, which will walk you through setting up
WSL, installing Python, and creating a Hello World application running in WSL.
Verify the Python installation
To verify that you've installed Python successfully on your machine, run one of the following
commands (depending on your operating system):
● Linux/macOS: open a Terminal Window and type the following command:
python3 --version

● Windows: open a command prompt and run the following command:


py -3 --version

If the installation was successful, the output window should show the version of Python that you
installed.

TKINTER INTERFACE
Tkinter is a Python binding to the Tk GUI toolkit. It is the standard Python interface to
the Tk GUI toolkit and is Python's de facto standard GUI. Tkinter is included with standard
Linux, Microsoft Windows and Mac OS X installs of Python. The name Tkinter comes from Tk
interface. Tkinter was written by Fredrik Lundh. Tkinter is free software released under a Python
license.
As with most other modern Tk bindings, Tkinter is implemented as a Python wrapper around a
complete Tool Command Language (TCL) interpreter embedded in the Python interpreter.
Tkinter calls are translated into Tcl commands which are fed to this embedded interpreter, thus
making it possible to mix Python and TCL in a single application. In Tkinter,
the Frame widget is the basic unit of organization for complex layouts. A frame is a rectangular
area that can contain other widgets. When any widget is created, a parent child relationship is
created. For example, if you place a text label inside a frame, the frame is the parent of the label.

Python offers multiple options for developing GUI (Graphical User Interface). Out of all the GUI
methods, tkinter is most commonly used method. It is a standard Python interface to the Tk GUI
toolkit shipped with Python. Python with tkinter outputs the fastest and easiest way to create the
GUI applications.

To create a tkinter:

Importing the module – tkinter

Create the main window (container)

Add any number of widgets to the main window

Apply the event Trigger on the widgets.

MACHINE LEARNING
Tom Mitchell states machine learning as “A computer program is said to learn from
experience and from some tasks and some performance on, as measured by, improves with
experience”. Machine Learning is combination of correlations and relationships, most machine
learning algorithms in existence are concerned with finding and/or exploiting relationship
between datasets. Once Machine Learning Algorithms can pinpoint on certain correlations, the
model can either use these relationships to predict future observations or generalize the data to
reveal interesting patterns. In Machine Learning there are various types of algorithms such as
Regression, Linear Regression, Logistic Regression, Naive Bayes Classifier, Bayes theorem,
KNN (K-Nearest Neighbor Classifier), Decision Tress, Entropy, ID3, SVM (Support Vector
Machines), K-means Algorithm, Random Forest and etc.,

The name machine learning was coined in 1959 by Arthur Samuel. Machine learning
explores the study and construction of algorithms that can learn from and make predictions on
data Machine learning is closely related to (and often overlaps with) computational statistics,
which also focuses on prediction-making through the use of computers. It has strong ties to
mathematical optimization, which delivers methods, theory and application domains to the field.
Machine learning is sometimes conflated with data mining, where the latter subfield focuses
more on exploratory data analysis and is known as unsupervised learning.

Within the field of data analytics, machine learning is a method used to devise complex
models and algorithms that lend themselves to prediction; in commercial use, this is known as
predictive analytics. These analytical models allow researchers, data scientists, engineers, and
analysts to "produce reliable, repeatable decisions and results" and uncover "hidden insights"
through learning from historical relationships and trends in the data.

Machine learning tasks Machine learning tasks are typically classified into several broad
categories:

Supervised learning: The computer is presented with example inputs and their desired outputs,
given by a "teacher", and the goal is to learn a general rule that maps inputs to outputs. In special
cases, the input signal can be only partially available, or restricted to special feedback.

Semi-supervised learning: The computer is given only an incomplete training signal: a training
set with some (often many) of the target outputs missing.

Active learning: The computer can only obtain training labels for a limited set of instances
(based on a budget), and also has to optimize its choice of objects to acquire labels for. When
used interactively, these can be presented to the user for labelling.
Unsupervised learning: No labels are given to the learning algorithm, leaving it on its own to
find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden
patterns in data) or a means towards an end (feature learning).

Reinforcement learning: Data (in the form of rewards and punishments) are given only as
feedback to the program's actions in a dynamic environment, such as driving a vehicle or playing
a game against an opponent.

FEATURES OF MACHINE LEARNING

• It is nothing but automating the Automation.

• Getting computers to program themselves.

• Writing Software is a bottleneck.

• Machine learning models involves machines learning from data without the help of humans or
any kind of human intervention.

• Machine Learning is the science of making computers learn and act like humans by feeding
data and information without being explicitly programmed.

• Machine Learning is totally different from traditional programming, here data and output is
given to the computer and in return it gives us the program which provides solutions to the
various problems.

• Machine Learning is a combination of Algorithms, Datasets, and Programs.

• There are Many Algorithms in Machine Learning through which we will provide us the exact
solution in predicting the disease of the patients.

• There are various applications in which machine learning is implemented such as Web search,
computing biology, finance, e-commerce, space exploration, robotics, social networks,
debugging and much more.
• There are 3 types of machine learning: supervised, unsupervised, and reinforcement.

Algorithms

1.Decision Tree

Decision Tree is a Supervised learning technique that can be used for both classification
and Regression problems, but mostly it is preferred for solving Classification problems.
It is a tree-structured classifier, where internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node represents the outcome.

In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node.
Decision nodes are used to make any decision and have multiple branches, whereas Leaf
nodes are the output of those decisions and do not contain any further branches.

The decisions or the test are performed on the basis of features of the given dataset.

It is a graphical representation for getting all the possible solutions to a problem/decision


based on given conditions.

It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.

In order to build a tree, we use the CART algorithm, which stands for Classification and
Regression Tree algorithm.

A decision tree simply asks a question, and based on the answer (Yes/No), it further split
the tree into subtrees.

How does the Decision Tree algorithm Work?

In a decision tree, for predicting the class of the given dataset, the algorithm starts from the
root node of the tree. This algorithm compares the values of root attribute with the record (real
dataset) attribute and, based on the comparison, follows the branch and jumps to the next
node.

For the next node, the algorithm again compares the attribute value with the other sub-nodes
and move further. It continues the process until it reaches the leaf node of the tree. The
complete process can be better understood using the below algorithm:

● Step-1: Begin the tree with the root node, says S, which contains the complete dataset.

● Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).

● Step-3: Divide the S into subsets that contains possible values for the best attributes.

● Step-4: Generate the decision tree node, which contains the best attribute.

● Step-5: Recursively make new decision trees using the subsets of the dataset created in
step -3. Continue this process until a stage is reached where you cannot further classify
the nodes and called the final node as a leaf node.

Attribute Selection Measures

While implementing a Decision tree, the main issue arises that how to select the best attribute
for the root node and for sub-nodes. So, to solve such problems there is a technique which is
called as Attribute selection measure or ASM. By this measurement, we can easily select the
best attribute for the nodes of the tree. There are two popular techniques for ASM, which are:
● Information Gain

● Gini Index

1. Information Gain:

● Information gain is the measurement of changes in entropy after the segmentation of a


dataset based on an attribute.

● It calculates how much information a feature provides us about a class.

● According to the value of information gain, we split the node and build the decision tree.

● A decision tree algorithm always tries to maximize the value of information gain, and a
node/attribute having the highest information gain is split first. It can be calculated
using the below formula:

1. Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)

Entropy: Entropy is a metric to measure the impurity in a given attribute. It specifies


randomness in data. Entropy can be calculated as:

Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)

Where,

● S= Total number of samples

● P(yes)= probability of yes

● P(no)= probability of no

2. Gini Index:

● Gini index is a measure of impurity or purity used while creating a decision tree in the
CART(Classification and Regression Tree) algorithm.
● An attribute with the low Gini index should be preferred as compared to the high Gini
index.

● It only creates binary splits, and the CART algorithm uses the Gini index to create binary
splits.

Gini index can be calculated using the below formula:

Gini Index= 1- ∑jPj2

Pruning: Getting an Optimal Decision tree


Pruning is a process of deleting the unnecessary nodes from a tree in order to get the optimal
decision tree.

A too-large tree increases the risk of overfitting, and a small tree may not capture all the
important features of the dataset. Therefore, a technique that decreases the size of the
learning tree without reducing accuracy is known as Pruning. There are mainly two types of
tree pruning technology used:

● Cost Complexity Pruning

● Reduced Error Pruning.

Advantages of the Decision Tree

● It is simple to understand as it follows the same process which a human follow while
making any decision in real-life.

● It can be very useful for solving decision-related problems.

● It helps to think about all the possible outcomes for a problem.

● There is less requirement of data cleaning compared to other algorithms.

Disadvantages of the Decision Tree


● The decision tree contains lots of layers, which makes it complex.

● It may have an overfitting issue, which can be resolved using the Random Forest
algorithm.

● For more class labels, the computational complexity of the decision tree may increase.

2.Random Forest
Random Forest is a popular machine learning algorithm that belongs to the supervised learning
technique. It can be used for both Classification and Regression problems in ML. It is based on
the concept of ensemble learning, which is a process of combining multiple classifiers to solve
a complex problem and to improve the performance of the model.

As the name suggests, "Random Forest is a classifier that contains a number of decision trees
on various subsets of the given dataset and takes the average to improve the predictive
accuracy of that dataset." Instead of relying on one decision tree, the random forest takes the
prediction from each tree and based on the majority votes of predictions, and it predicts the
final output.

The greater number of trees in the forest leads to higher accuracy and prevents the problem
of overfitting.

Assumptions for Random Forest


Since the random forest combines multiple trees to predict the class of the dataset, it is
possible that some decision trees may predict the correct output, while others may not. But
together, all the trees predict the correct output. Therefore, below are two assumptions for a
better Random forest classifier:

● There should be some actual values in the feature variable of the dataset so that the
classifier can predict accurate results rather than a guessed result.

● The predictions from each tree must have very low correlations.

Why use Random Forest?


Below are some points that explain why we should use the Random Forest algorithm:
● It takes less training time as compared to other algorithms.

● It predicts output with high accuracy, even for the large dataset it runs efficiently.

● It can also maintain accuracy when a large proportion of data is missing.

How does Random Forest algorithm work?


Random Forest works in two-phase first is to create the random forest by combining N decision
tree, and second is to make predictions for each tree created in the first phase.

The Working process can be explained in the below steps and diagram:

Step-1: Select random K data points from the training set.

Step-2: Build the decision trees associated with the selected data points (Subsets).

Step-3: Choose the number N for decision trees that you want to build.

Step-4: Repeat Step 1 & 2.

Step-5: For new data points, find the predictions of each decision tree, and assign the new data
points to the category that wins the majority votes.

Advantages of Random Forest

● Random Forest is capable of performing both Classification and Regression tasks.

● It is capable of handling large datasets with high dimensionality.

● It enhances the accuracy of the model and prevents the overfitting issue.

Disadvantages of Random Forest

● Although random forest can be used for both classification and regression tasks, it is
not more suitable for Regression tasks.
3. Naive Bayes
● Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes
theorem and used for solving classification problems.

● It is mainly used in text classification that includes a high-dimensional training dataset.

● Naïve Bayes Classifier is one of the simple and most effective Classification algorithms
which helps in building the fast machine learning models that can make quick
predictions.

● It is a probabilistic classifier, which means it predicts on the basis of the probability of


an object.

● Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental
analysis, and classifying articles.

Why is it called Naïve Bayes?


The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be
described as:

● Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features. Such as if the fruit is identified on the
bases of color, shape, and taste, then red, spherical, and sweet fruit is recognized as an
apple. Hence each feature individually contributes to identify that it is an apple without
depending on each other.
● Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.

Bayes' Theorem:

● Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine
the probability of a hypothesis with prior knowledge. It depends on the conditional
probability.

● The formula for Bayes' theorem is given as:

Where,

P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.

P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a
hypothesis is true.

P(A) is Prior Probability: Probability of hypothesis before observing the evidence.

P(B) is Marginal Probability: Probability of Evidence.

Working of Naïve Bayes' Classifier:


Working of Naïve Bayes' Classifier can be understood with the help of the below example:

Suppose we have a dataset of weather conditions and corresponding target variable "Play". So
using this dataset we need to decide that whether we should play or not on a particular day
according to the weather conditions. So to solve this problem, we need to follow the below
steps:
1. Convert the given dataset into frequency tables.

2. Generate Likelihood table by finding the probabilities of given features.

3. Now, use Bayes theorem to calculate the posterior probability.

Advantages of Naïve Bayes Classifier:

● Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.

● It can be used for Binary as well as Multi-class Classifications.

● It performs well in Multi-class predictions as compared to the other Algorithms.

● It is the most popular choice for text classification problems.

Disadvantages of Naïve Bayes Classifier:

● Naive Bayes assumes that all features are independent or unrelated, so it cannot learn
the relationship between features.

4.1.2 Tools

The tool used is VISUAL STUDIO IDE.

Visual Studio Code is a code editor redefined and optimized for building and debugging modern
web and cloud applications.
INSTALL VISUAL STUDIO IDE

Step 1. For downloading the Visual Studio IDE, go through the link given below:

https://www.visualstudio.com/downloads

Step 2. As clicked on download link, it starts downloading an .exe file

Step 3. Click on the .exe file and then, it shows a pop-up window.

Step 4. Click on the Run button, and then it shows the below image.

Step 5. Click on the Continue Button.

Step 6. After clicking on Continue, Visual Studio will start downloading its initial

files. Step 7.Click on the install button and open the IDE for code execution.
4.2 Screenshots

I. User Login page


2. User Registration Page
3.Disease Prediction Page
CHAPTER V
TESTING
5.1 Introduction
Testing is the process where the test data is prepared and is used for testing the modules
individually and later the validation given for the fields. Then the system testing takes place
which makes sure that all components of the system property functions as a unit. The test data
should be chosen such that it passed through all possible condition. The following is the
description of the testing strategies, which were carried out during the testing period.

5.2 TESTING METHODS

SYSTEM TESTING

Testing has become an integral part of any system or project especially in the field of information
technology. The importance of testing is a method of justifying, if one is ready to move further,
be it to be check if one is capable to with stand the rigors of a particular situation cannot be
underplayed and that is why testing before development is so critical. When the software is
developed before it is given to user to user the software must be tested whether it is solving the
purpose for which it is developed. This testing involves various types through which one can
ensure the software is reliable. The program was tested logically and pattern of execution of the
program for a set of data are repeated. Thus the code was exhaustively checked for all possible
correct data and the outcomes were also checked.
MODULE TESTING

To locate errors, each module is tested individually. This enables us to detect error and correct it
without affecting any other modules. Whenever the program is not satisfying the required
function, it must be corrected to get the required result. Thus all the modules are individually
tested from bottom up starting with the smallest and lowest modules and proceeding to the next
level. Each module in the system is tested separately. For example the job classification module
is tested separately. This module is tested with different job and its approximate execution time
and the result of the test is compared with the results that are prepared manually.

INTEGRATION TESTING

After the module testing, the integration testing is applied. When linking the modules there may
be chance for errors to occur, these errors are corrected by using this testing. In this system all
modules are connected and tested. The testing results are very correct. Thus the mapping of jobs
with resources is done correctly by the system

ACCEPTANCE TESTING

When that user fined no major problems with its accuracy, the system passers through a final
acceptance test. This test confirms that the system needs the original goals, objectives and
requirements established during analysis without actual execution which elimination wastage of
time and money acceptance tests on the shoulders of users and management, it is finally
acceptable and ready for the operation.
5.3 Test Cases

Case1:User login to the system using valid details.


After entering valid login details prediction page will be displayed.
Case2: User login to the system using invalid details
Error message will be displayed.
Case3: User registering into the system
Registration success message will be displayed.
Case4: User registering into the system with already registered email

Error message email already exists message will be displayed.


Case5: Disease Prediction using symptoms given by the user as input.
Predicted disease will be displayed.
CHAPTER VI
6.1 CONCLUSION
So, Finally I conclude by saying that, this project Contagious Disease prediction using machine
learning is very much useful in everyone’s day to day life and it is mainly more important for the
healthcare sector, because they are the one that daily uses these systems to predict the diseases of
the patients based on their general information and there symptoms that they are been through.
Now a day’s health industry plays major role in curing the diseases of the patients so this is also
some kind of help for the health industry to tell the user and also it is useful for the user in case
he/she doesn’t want to go to the hospital or any other clinics, so just by entering the symptoms
and all other useful information the user can get to know the disease he/she is suffering from and
the health industry can also get benefit from this system by just asking the symptoms from the
user and entering in the system and in just few seconds they can tell the exact and up to some
extent the accurate diseases. If the health industry adopts this project then the work of the doctors
can be reduced and they can easily predict the disease of the patient. The Disease prediction is to
provide predictions for the various and generally occurring diseases that when unchecked and
sometimes ignored can turn into fatal disease and cause a lot of problems to the patient and as
well as their family members.

6.2 FUTURE ENHANCEMENT

• Facility for modifying user detail.


• More interactive user interface.
• Facilities for Backup creation.
• Can be done as a Web page.
• Can be done as a Mobile Application.
• More Details and Latest Diseases.
CHAPTER VII
APPENDICES
7.1 CODE SNIPPET
import numpy as np
import pandas as pd
import analysis
import mysql.connector
from tkinter import *
import tkinter as tk
from tkinter import ttk, messagebox

def a():
l1=['back_pain','constipation','abdominal_pain','diarrhoea','mild_fever','yellow_urine',
'yellowing_of_eyes','acute_liver_failure','fluid_overload','swelling_of_stomach',
'swelled_lymph_nodes','malaise','blurred_and_distorted_vision','phlegm','throat_irritation',
'redness_of_eyes','sinus_pressure','runny_nose','congestion','chest_pain','weakness_in_limbs'
, 'fast_heart_rate','pain_during_bowel_movements','pain_in_anal_region','bloody_stool',
'irritation_in_anus','neck_pain','dizziness','cramps','bruising','obesity','swollen_legs',
'swollen_blood_vessels','puffy_face_and_eyes','enlarged_thyroid','brittle_nails',
'swollen_extremeties','excessive_hunger','extra_marital_contacts','drying_and_tingling_lips',
'slurred_speech','knee_pain','hip_joint_pain','muscle_weakness','stiff_neck','swelling_joints',
'movement_stiffness','spinning_movements','loss_of_balance','unsteadiness',
'weakness_of_one_body_side','loss_of_smell','bladder_discomfort','foul_smell_of urine',
'continuous_feel_of_urine','passage_of_gases','internal_itching','toxic_look_(typhos)',
'depression','irritability','muscle_pain','altered_sensorium','red_spots_over_body','belly_pain'
, 'abnormal_menstruation','dischromic
_patches','watering_from_eyes','increased_appetite','polyuria','family_history','mucoid_sputum',

'rusty_sputum','lack_of_concentration','visual_disturbances','receiving_blood_transfusion',
'receiving_unsterile_injections','coma','stomach_bleeding','distention_of_abdomen',
'history_of_alcohol_consumption','fluid_overload','blood_in_sputum','prominent_veins_on_calf'
, 'palpitations','painful_walking','pus_filled_pimples','blackheads','scurring','skin_peeling',
'silver_like_dusting','small_dents_in_nails','inflammatory_nails','blister','red_sore_around_nose',
'Yellow_crust_ooze']

disease=['Fungal infection','Allergy','GERD','Chronic cholestasis','Drug Reaction', 'Peptic


ulcer diseae','AIDS','Diabetes','Gastroenteritis','Bronchial Asthma','Hypertension', '
Migraine','Cervical spondylosis','Paralysis (brain hemorrhage)','Jaundice','Malaria','Chicken
pox','Dengue','Typhoid','hepatitisA','HepatitisB','Hepatitis C','Hepatitis D','Hepatitis E','Alcoholic
hepatitis','Tuberculosis','Common Cold','Pneumonia','Dimorphic hemmorhoids(piles)',
'Heartattack','Varicoseveins','Hypothyroidism','Hyperthyroidism','Hypoglycemia','Osteoarthristis',
'Arthritis','(vertigo) Paroymsal Positional Vertigo','Acne','Urinary tract infection','Psoriasis',
'Impetigo']

l2=[]
for x in range(0,len(l1)):

l2.append(0)

# TESTING DATA df -------------------------------------------------------------------------------------


df=pd.read_csv("Training.csv")

df.replace({'prognosis':{'Fungal infection':0,'Allergy':1,'GERD':2,'Chronic cholestasis':3,'Drug


Reaction':4,'Pepticulcerdiseae':5,'AIDS':6,'Diabetes':7,'Gastroenteritis':8,'Bronchial
Asthma':9,'Hypertension ':10,'Migraine':11,'Cervical spondylosis':12,
'Paralysis(brainhemorrhage)':13,'Jaundice':14,'Malaria':15,'Chickenpox':16,'Dengue':17,'Typhoid'
:18,'hepatitis A':19,'Hepatitis B':20,'Hepatitis C':21,'Hepatitis D':22,'Hepatitis E':23,'Alcoholic
hepatitis':24,'Tuberculosis':25,'CommonCold':26,'Pneumonia':27,'Dimorphic
hemmorhoids(piles)':28,'Heart attack':29,'Varicose veins':30,'Hypothyroidism':31,

'Hyperthyroidism':32,'Hypoglycemia':33,'Osteoarthristis':34,'Arthritis':35, '(vertigo) Paroymsal


Positional Vertigo':36,'Acne':37,'Urinary tract infection':38,'Psoriasis':39,
'Impetigo':40}},inplace=True)

# print(df.head())
X= df[l1]
y = df[["prognosis"]]
np.ravel(y)
# print(y)
# TRAINING DATA tr --------------------------------------------------------------------------------

tr=pd.read_csv("Testing.csv")

tr.replace({'prognosis':{'Fungal infection':0,'Allergy':1,'GERD':2,'Chronic cholestasis':3,'Drug


Reaction':4,'Pepticulcerdiseae':5,'AIDS':6,'Diabetes':7,'Gastroenteritis':8,'BronchialAsthma':9,'Hy
pertension':10,'Migraine':11,'Cervicalspondylosis':12,'Paralysis(brainhemorrhage)':13,'Jaundice':
14,'Malaria':15,'Chickenpox':16,'Dengue':17,'Typhoid':18,'hepatitisA':19,'HepatitisB':20,'Hepatiti
sC':21,'Hepatitis D':22,'Hepatitis E':23,'Alcoholic hepatitis':24,'Tuberculosis':25,'Common
Cold':26,'Pneumonia':27,'Dimorphichemmorhoids(piles)':28,'Heartattack':29,'Varicose
veins':30,'Hypothyroidism':31,'Hyperthyroidism':32,'Hypoglycemia':33,'Osteoarthristis':34,'Arthr
itis':35,'(vertigo)ParoymsalPositionalVertigo':36,'Acne':37,'Urinarytract
infection':38,'Psoriasis':39,
'Impetigo':40}},inplace=True)
X_test= tr[l1]
y_test = tr[["prognosis"]]
np.ravel(y_test)
# ------------------------------------------------------------------------------------------------------
acc=0

def DecisionTree():
from sklearn import tree

clf3 = tree.DecisionTreeClassifier() # empty model of the decision tree


clf3 = clf3.fit(X,y)
# calculating accuracy-------------------------------------------------------------------
from sklearn.metrics import accuracy_score

y_pred=clf3.predict(X_test)
print(accuracy_score(y_test, y_pred))
print(accuracy_score(y_test, y_pred,normalize=False))
acc=accuracy_score(y_test, y_pred)
analysis.l5.append(acc)

# -----------------------------------------------------
psymptoms =
[Symptom1.get(),Symptom2.get(),Symptom3.get(),Symptom4.get(),Symptom5.get()
] print(psymptoms)

for k in range(0,len(l1)):
# print (k,)
for z in psymptoms:
if(z==l1[k]):
l2[k]=1
inputtest = [l2]
predict = clf3.predict(inputtest)
predicted=predict[0]
h='no'
for a in range(0,len(disease)):
if(predicted == a):
h='yes'
d=disease[a]
analysis.dis.append(d)

break
analysis.a.append(h)
def randomforest():
from sklearn.ensemble import RandomForestClassifier
clf4 = RandomForestClassifier()
clf4 = clf4.fit(X,np.ravel(y))

# calculating accuracy-------------------------------------------------------------------
from sklearn.metrics import accuracy_score

y_pred=clf4.predict(X_test)
print(accuracy_score(y_test, y_pred))
print(accuracy_score(y_test, y_pred,normalize=False))
acc=accuracy_score(y_test, y_pred)
analysis.l5.append(acc)

# -----------------------------------------------------
psymptoms=
[Symptom1.get(),Symptom2.get(),Symptom3.get(),Symptom4.get(),Symptom5.get()
] for k in range(0,len(l1)):

for z in psymptoms:
if(z==l1[k]):
l2[k]=1
inputtest = [l2]
predict = clf4.predict(inputtest)
predicted=predict[0]
h='no'
for a in range(0,len(disease)):
if(predicted == a):
h='yes'

d=disease[a]
analysis.dis.append(d)
break
analysis.a.append(h)
def NaiveBayes():
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
gnb=gnb.fit(X,np.ravel(y))

# calculating accuracy-------------------------------------------------------------------
from sklearn.metrics import accuracy_score

y_pred=gnb.predict(X_test)
print(accuracy_score(y_test, y_pred))
print(accuracy_score(y_test, y_pred,normalize=False))
acc=accuracy_score(y_test, y_pred)
analysis.l5.append(acc)

# -----------------------------------------------------
psymptoms=
[Symptom1.get(),Symptom2.get(),Symptom3.get(),Symptom4.get(),Symptom5.get()
] for k in range(0,len(l1)):

for z in psymptoms:
if(z==l1[k]):
l2[k]=1
inputtest = [l2]
predict = gnb.predict(inputtest)
predicted=predict[0]
print(analysis.l5)
h='no'

for a in range(0,len(disease)):
if(predicted == a):
h='yes'
d=disease[a]
analysis.dis.append(d)
break
analysis.a.append(h)
if(analysis.a[0]=="yes" and analysis.l5[0]>= analysis.l5[1] and analysis.l5[0]>=analysis.l5[2]):
t1.delete("1.0", END)

t1.insert(END, analysis.dis[0])
elif(analysis.a[1]=="yes" and analysis.l5[1]>= analysis.l5[0] and
analysis.l5[1]>=analysis.l5[2]):

t1.delete("1.0", END)
t1.insert(END, analysis.dis[1])
else:
t1.delete("1.0", END)
t1.insert(END, analysis.dis[2])
root = Tk()

# entry variables

Symptom1 = StringVar()
Symptom1.set(None)
Symptom2 = StringVar()
Symptom2.set(None)
Symptom3 = StringVar()
Symptom3.set(None)
Symptom4 = StringVar()
Symptom4.set(None)

Symptom5 = StringVar()
Symptom5.set(None)
Name = StringVar()

# Heading

w2 = Label(root, justify=LEFT, text="Disease Predictor using Machine Learning")


w2.config(font=("Elephant", 30))

w2.grid(row=1, column=0, columnspan=2, padx=100)


# labels

root.minsize(width=1530, height=1000)
S1Lb = Label(root, text="Symptom 1")
S1Lb.grid(row=9, column=1, pady=5, sticky=W)
S2Lb = Label(root, text="Symptom 2")
S2Lb.grid(row=10, column=1, pady=5, sticky=W)
S3Lb = Label(root, text="Symptom 3")
S3Lb.grid(row=11, column=1, pady=5, sticky=W)
S4Lb = Label(root, text="Symptom 4")
S4Lb.grid(row=12, column=1, pady=5, sticky=W)
S5Lb = Label(root, text="Symptom 5")
S5Lb.grid(row=13, column=1, pady=5, sticky=W)
lrLb = Label(root, text="Predicted disease")
lrLb.grid(row=14, column=1, pady=10,sticky=W)
# entries
OPTIONS = sorted(l1)
S1En = OptionMenu(root, Symptom1,*OPTIONS)
S1En.grid(row=9, column=1)

S2En = OptionMenu(root, Symptom2,*OPTIONS)


S2En.grid(row=10, column=1)
S3En = OptionMenu(root, Symptom3,*OPTIONS)
S3En.grid(row=11, column=1)
S4En = OptionMenu(root, Symptom4,*OPTIONS)
S4En.grid(row=12, column=1)
S5En = OptionMenu(root, Symptom5,*OPTIONS)
S5En.grid(row=13, column=1)
def login():
if(Symptom1.get()=="None" and Symptom2.get()=="None" and Symptom3.get()=="None"
and Symptom4.get()=="None" and Symptom5.get()=="None"):

messagebox.showerror("Error","Enter symptoms")
else:
analysis.l5=[]
analysis.a=[]
analysis.dis=[]
DecisionTree()
randomforest()
NaiveBayes()
dst = Button(root, text="Predict", command=login)
dst.grid(row=10, column=2,padx=10)
#textfileds

t1 = Text(root, height=1, width=40)


t1.grid(row=14, column=1, padx=10)
root.mainloop()
def signup():
# signup database connect

def action():
if first_name.get()=="" or email.get()=="" or age.get()=="" or password.get()=="" or
very_pass.get()=="":

messagebox.showerror("Error" , "All Fields Are Required" , parent = winsignup)


elif password.get() != very_pass.get():

messagebox.showerror("Error" , "Password & Confirm Password Should Be Same" ,


parent = winsignup)

else:
print("1")
con=
mysql.connector.connect(host="localhost",user="pma",password="1280",database="mysql")
print("2")

cur = con.cursor()
print("3")
sql=("select * from data where email=%s")
v=(email.get(),)
cur.execute(sql,v)
print("4")
row = cur.fetchone()
print("5")
if row!=None:
messagebox.showerror("Error" , "email Already Exits", parent = winsignup)
else:

sql=("insert into data (name,email,age,gender,password) values(%s,%s,%s,%s,%s)")


val = (first_name.get(),email.get(),age.get(),var.get(),password.get(),)
cur.execute(sql, val)
print(val)
con.commit()

con.close()
messagebox.showinfo("Success" , "Ragistration Successfull" , parent = winsignup)
clear()

# clear data function


def clear():
first_name.delete(0,END)
email.delete(0,END)
age.delete(0,END)
var.set(1)
password.delete(0,END)
very_pass.delete(0,END)
# start Signup Window
winsignup = Tk()
l=["Male","Female"]
winsignup.title("R")
winsignup.maxsize(width=500 , height=600)
winsignup.minsize(width=500 , height=600)
#heading label
heading = Label(winsignup , text = "Signup" , font = 'Verdana 20 bold')
heading.place(x=80 , y=60)

# form data label


first_name = Label(winsignup, text= "Name :" , font='Verdana 10 bold')
first_name.place(x=80,y=130)

email = Label(winsignup, text= "Email :" , font='Verdana 10 bold')


email.place(x=80,y=160)
age = Label(winsignup, text= "Age :" , font='Verdana 10 bold')
age.place(x=80,y=190)
Gender = Label(winsignup, text= "Gender :" , font='Verdana 10 bold')

Gender.place(x=80,y=220)

password = Label(winsignup, text= "Password :" , font='Verdana 10 bold')


password.place(x=80,y=258)

very_pass = Label(winsignup, text= "Verify Password:" , font='Verdana 10 bold')


very_pass.place(x=80,y=290)

# Entry Box ------------------------------------------------------------------


first_name = StringVar()

email = StringVar()
age = IntVar(winsignup, value='0')
var= StringVar()
password = StringVar()
very_pass = StringVar()
first_name = Entry(winsignup, width=40 , textvariable = first_name)
first_name.place(x=200 , y=133)

email = Entry(winsignup, width=40 , textvariable = email)


email.place(x=200 , y=163)
age = Entry(winsignup, width=40, textvariable=age)
age.place(x=200 , y=193)
ttk.Radiobutton(winsignup,text='Male', value=0, variable = var).place(x= 200 , y= 220)
ttk.Radiobutton(winsignup,text='Female', value=1, variable = var).place(x= 200 , y= 238)
password = Entry(winsignup, width=40, textvariable = password)

password.place(x=200 , y=263)
very_pass= Entry(winsignup, width=40 ,show="*" , textvariable = very_pass)
very_pass.place(x=200 , y=293)

# button login and clear


btn_signup = Button(winsignup, text = "Signup" ,font='Verdana 10 bold', command = action)
btn_signup.place(x=200, y=323)

btn_login = Button(winsignup, text = "Clear" ,font='Verdana 10 bold' , command = clear)

btn_login.place(x=280, y=323)

sign_up_btn = Button(winsignup , text="Switch To Login" , command = fun )


sign_up_btn.place(x=350 , y =20)

winsignup.mainloop()
def fun():
def clear():
userentry.delete(0,END)
passentry.delete(0,END)
def login():
mydb =
mysql.connector.connect(host="localhost",user="pma",password="1280",database="mysql")
if user_name.get()=="" or password.get()=="":

messagebox.showerror("Error","Enter User Name And Password",parent=win)


else:

try:
cur = mydb.cursor()
cur.execute("select * from data where email=%s and password =
%s",(user_name.get(),password.get()))
row = cur.fetchone()
if row==None:
messagebox.showerror("Error" , "Invalid User Name And Password", parent = win)
else:

win.destroy()
a()
#messagebox.showinfo("Success" , "Successfully Login" , parent = win)
except Exception as es:
messagebox.showerror("Error" , f"Error Dui to : {str(es)}", parent = win)
win = Tk()

win.title("Disease Prediction App")

win.maxsize(width=500 , height=500)
win.minsize(width=500 , height=500)
heading = Label(win , text = "Login" , font = 'Verdana 25 bold')
heading.place(x=80 , y=150)

username = Label(win, text= "Email :" , font='Verdana 10 bold')


username.place(x=80,y=220)

userpass = Label(win, text= "Password :" , font='Verdana 10 bold')


userpass.place(x=80,y=260)

user_name = StringVar()
password = StringVar()
userentry = Entry(win, width=40 , textvariable = user_name)
userentry.focus()
userentry.place(x=200 , y=223)
passentry = Entry(win, width=40, show="*" ,textvariable = password)
passentry.place(x=200 , y=260)

btn_login = Button(win, text = "Login" ,font='Verdana 10 bold',command = login)


btn_login.place(x=200, y=293)

btn_login = Button(win, text = "Clear" ,font='Verdana 10 bold', command = clear)


btn_login.place(x=260, y=293)

sign_up_btn = Button(win , text="Switch To Sign up" , command = signup )


sign_up_btn.place(x=350 , y =20)

win.mainloop()
fun()
7.2 BIBLIOGRAPHY
1. Disease Prediction and Doctor Recommendation System by www.irjet.net

2. Disease Prediction Based on Prior Knowledge by www.hcup us.ahrq.gov/nisoverview.jsp

3. GDPS - General Disease Prediction System by www.irjet.net

4. Disease Prediction Using Machine Learning by International Research Journal of Engineering


and Technology (IRJET).

5. Kaveeshwar, S.A., and Cornwall, J., 2014, “The current state of disease mellitus in India”.
AMJ, 7(1), pp. 45-48.

6. Dean, L., McEntyre, J., 2004, “The Genetic Landscape of Disease [Internet]. Bethesda (MD):
National Center for Biotechnology Information (US); Chapter 1, Introduction to Disease. 2004
Jul 7.

7. Machine Learning Methods Used in Disease by www.wikipedia.com

8. https://www.researchgate.net/publication/325116774_disease_prediction_usin
g_machine_learning_techniques
9. https://ieeexplore.ieee.org/document/8819782/disease_prediction

10. Algorithms Details from www.dataspirant.com


11. https://www.youtube.com/disease_prediction

You might also like