You are on page 1of 6

10 IV April 2022

https://doi.org/10.22214/ijraset.2022.41230
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue IV Apr 2022- Available at www.ijraset.com

Disease Prediction using Machine Learning


Magilan G1, Sarath vijay R2, Swaitha K3, Jona J.B4
1, 2, 3
Department of Computer Applications, Coimbatore Institute of Technology, India
4
Associate Professor, Department of Computer Applications, Coimbatore Institute of Technology, India

Abstract: It is a system which provides the user the information and tricks to take care of the health system of the user and it
provides how to search out the disease using this prediction. Now a day’s health industry plays major role in curing the
diseases of the patients so this is often also some quite help for the health industry to inform the user and also it's useful for the
user just in case he/she doesn’t want to travel to the hospital or the other clinics, so just by entering the symptoms and every
one other useful information the user can get to grasp the disease he/she is affected by and also the health industry may also
get enjoy this method by just asking the symptoms from the stoner and entering within the system and in only many seconds
they'll tell the precise and over to some extent the accurate conditions. This Disease Prediction Using Machine Learning is
totally through with the assistance of Machine Learning and Python programming language and also using the dataset that's
available previously by the hospitals using that we are going to predict the diseases.
Keywords: Prediction, Decision Tree, Random forest, Naive Bayes

I. INTRODUCTION
The purpose of constructing this project called “Disease Prediction Using Machine Learning” is to predict the accurate disease of
the patient using all their general information’s and also the symptoms. If this Prediction is completed at the first stages of the
disease with the assistance of this project and every one other necessary measure disease is cured and generally this prediction
system can even be very useful in health industry. The final purpose of this Disease prediction is to supply prediction for the
assorted and customarily occurring diseases that when unchecked and sometimes ignored can turns into fatal disease and cause lot
of problem to the patient and moreover as their members of the family. this method will predict the foremost possible disease
supported the symptoms. The health industry in information yet and knowledge poor and this industry is incredibly vast industry
which has lot of labor to be done. So, with the assistance of all those algorithms, techniques and methodologies we've done this
project which is able to help the peoples who are within the need.

II. LITERATURE REVIEW


1) Diabetes is one of lethal diseases in the world. It is additional an inventor of various varieties of disorders for example:
coronary failure, blindness, urinary organ diseases etc. In such a case the patient is required to visit a diagnostic centre, to get
their reports after consultation. Due to every time they must invest their time and currency. But with the growth of Machine
Learning methods we have got the flexibility to search out an answer to the current issue, we have got advanced system
mistreatment information processing that has the ability to forecast whether the patient has polygenic illness or not.
Furthermore, forecasting the sickness initially ends up in providing the patients before it begins vital. Information withdrawal
has from a large quantity of diabetes associated information. The aim of this analysis is to develop a system which might
predict the diabetic risk level of a patient with a better accuracy. Model development is based on categorization methods as
Decision Tree, ANN, I Bayes and SVM algorithms. For Decision Tree, the models give precisions of 85%, for I Bayes 77% and
77.3% for Support Vector Machine. Outcomes show a significant accuracy of the methods.
2) The advances in data technology have witnessed nice progress on aid technologies in varied domains these days. However,
these new technologies have conjointly created aid information not solely abundant larger however conjointly way more
tough to handle and method. Moreover, as a result of the info created from a range of devices among a brief time span, the
characteristics of those information that they're hold on in numerous formats and created quickly, which can, to an oversized
extent, be considered an enormous information downside. To provide a a lot of convenient service and setting of aid, this
paper proposes a cyber-physical system for case centric aid operations and service, appertained to as Health-CPS, finagled on
pall and large information analytics the nodes of a DT tree technologies. this technique consists of a knowledge assortment
layer with a unified commonplace, a knowledge management layer for distributed storage and parallel computing, and a data-
oriented service layer. The results of this study show that the technologies of cloud and large information is accustomed
enhance the performance of the aid system so humans will then get pleasure from varied good aid applications and service.

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 417
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue IV Apr 2022- Available at www.ijraset.com

3) SVM is very good when we have no idea on the data. Even with unstructured and semi structured data like text, images, and
trees SVM algorithm works well. The drawback of the SVM algorithm is that to achieve the best classification results for any
given problem, several key parameters are needed to be set correctly. Decision tree: It is easy to understand and rule decision
tree. Instability is there in decision tree, that is bulky change can be seen by minor modification in the data structure of the
optimal decision tree. They are often relatively inaccurate. I Bayes: It is robust, handles the missing values by ignoring
probability estimation calculation. Sensitive to how inputs are prepared. Prone bias when increase the number of training
dataset. ANN: Gives good prediction and easy to implement. Difficult with dealing with big data with complex models. Require
huge processing time.
4) Diabetes is caused due to the excessive amount of sugar condensed into the blood. Currently, it is considered as one of the
lethal diseases in the world. People all around the globe are affected by this severe disease knowingly or unknowingly. Other
diseases like heart attack, paralyzed, kidney disease, blindness etc. are also caused by diabetes. Numerous computer-based
detection systems were designed and outlined for anticipating and analysing diabetes. Usual identifying process for diabetic
patients needs more time and money. But with the rise of machine learning, we have that ability to develop a solution to this
intense issue. Therefore, we have developed an architecture which has the capability to predict where the patient has diabetes or
not. Our main aim of this exploration is to build a web application based on the higher prediction accuracy of some powerful
machine learning algorithm. We have used a benchmark dataset namely Pima Indian which can predict the onset of diabetes
based on diagnostics manner. With an accuracy of 82.35% prediction rate Artificial Neural Network (ANN) shows a significant
improvement of accuracy which drives us to develop an Interactive Web Application for Diabetes Prediction.

III. ALGORITHM
A. Decision Tree Algorithm
Decision tree builds regression or classification models in the form of a tree structure. It breaks down a dataset into smaller and
smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a tree with decision
nodes and leaf nodes. A decision node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast and Rainy), each
representing values for the attribute tested. Leaf node (e.g., Hours Played) represents a decision on the numerical target. The
topmost decision node in a tree which corresponds to the best predictor called root node. Decision trees can handle both categorical
and numerical data. Depending on the take a look at outcome, the classification algorithmic rule branches towards the suitable kid
node wherever the method of take a look at and branching repeats till it reaches the leaf node . The leaf or terminal nodes
correspond to the choice outcomes. DTs are found straightforward to interpret and fast to be told, and area unit a standard element
to several medical diagnostic protocols [25]. once traversing the tree for the classification of a sample, the outcomes of all tests at
every node on the trail can offer spare data to conjecture concerning its categories. associate degree illustration of associate degree
DT with its components and rules is portrayed.
Random Forest is a supervised learning algorithm. It is an extension of machine learning classifiers which include the bagging to
improve the performance of Decision Tree. It combines tree predictors, and trees are dependent on a random vector which is
independently sampled. The distribution of all trees are the same. Random Forests splits nodes using the best among of a predictor
subset that are randomly chosen from the node itself, instead of splitting nodes based on the variables. The time complexity of the
worst case of learning with Random Forests is O(M(dnlogn)) , where M is the number of growing trees, n is the number of
instances, and d is the data dimension. It can be used both for classification and regression. It is also the most flexible and easy to
use algorithm. A forest consists of trees. It is said that the more trees it has, the more robust a forest is. Random Forests create
Decision Trees on randomly selected data samples, get predictions from each tree and select the best solution by means of voting. It
also provides a pretty good indicator of the feature importance.

B. Navie Bayes
Naive Bayes is a set of supervised learning algorithms based on the Bayes’ theorem with the “naïve” assumption of independence
between every pair of features. Despite its simplicity, it often outperforms more sophisticated classification methods. If there are
input variables x and output variable y, Bayes’ theorem states the following relationship. p(y|x) = p(y).p(x|y)/ p(x) In this project,
Gaussian Naïve Bayes algorithm has been implemented. In case of Gaussian Naïve Bayes, the likelihood of the features us assumed
to be Gaussian i.e. all continuous values x associated with class y are distributed according to Gaussian distribution. Given a
continuous attribute x in training data, the data is first segmented by the class y. Then, the mean and variance of x in each class is
computed.

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 418
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue IV Apr 2022- Available at www.ijraset.com

If μ be the mean of the values in x associated with class y, then let d2 be the variance of the values in x associated with class y.
Suppose there is some observation value v then, the probability distribution of v given by class y, p(x=v | y), can be computed by
plugging into the equation for a normal distribution thought-about during this figure. Thus, the chance of ‘white’ given ‘green’ is
zero.025 (1 ÷ 40) and therefore the chance of ‘white’ given ‘red’ is zero.15 (3 ÷ 20). though the previous chance indicates that the
new ‘white’ object is a lot of probably to Retain ‘ green’ class, the chance shows that it's a lot of presumably to be within the‘ red’
categories. within the theorem analysis, the ultimate classifier is created by combining each sources of knowledge (i.e., previous
chance and chance value). The ‘multiplication’ perform is employed to mix these 2 sorts of data and therefore the product is
termed the ‘posterior’ chance. Finally, the posterior chance of ‘white’ being ‘green’ is zero.017 (0.67 × 0.025) and therefore the
posterior chance of ‘white’ being ‘red’ is zero.049 (0.33 × 0.15). Thus, the new ‘white’ object ought to be category as a member of
the ‘red’ class per the NB technique.

Fig 3. Naive Bayes


IV. IMPLEMENTATION
The project malady Prediction mistreatment Machine Learning is developed to beat general malady in earlier stages as we tend to
all recognize in competitive surroundings of economic development the human race has concerned thus much that he/she isn't
involved regarding health per analysis there area unit four-hundredth peoples however Ignores regarding general malady that ends
up in harmful malady later. Even the interface of this project is completed mistreatment python's library interface referred to as
Tkinter. Here 1st the user must register into the system so as to use the prediction, user must register with username, email-id,
phone, agenda parole. of these values area unit keep into the filing system severally, then user has choice to move forward or
leave, then user must login to the system mistreatment the username and parole that he/she provided throughout the time of
registration. If he/she enter incorrect username and proper parole then the error message can prompt stating incorrect username
and he/she enters incorrect parole and proper when work within the user must the name and desires to pick out the symptoms from
given change posture menu, for additional correct result the user must enter all the given symptoms, then the system can give the
correct result. This prediction is essentially through with the assistance of three algorithms of machine learning like call Tree,
Random Forest and Naïve mathematician. once user enter all the symptoms then he must press the buttons of various rule, for
instance there area unit three buttons for three algorithms, if user enters all symptoms and presses solely Random Forest button
then the result are going to be provided solely shrewd mistreatment that rule, like this we've got used three algorithms to produce
additional clear image of the results and user must be happy along with his expected result.

V. RESULT
The result for this prediction system displays a convenient user interface consisting of details like name, symptoms and the
algorithm that we use to predict as a button and the results will be predicted based on the implemented algorithm.
It also displays the accuracy percentage on which algorithm has the best accuracy so based on the accuracy of the decision tree,
random forest and naive bayes algorithm random forest has the better accuracy percentage of 0.96. It is a best suited algorithm for
this model.

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 419
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue IV Apr 2022- Available at www.ijraset.com

Fig 4. User Interfac

e
Fig 5. Accuracy Rate of Algorithm

VI. CONCLUSION
The Prediction Engine that allows the user to examine whether or not he/she has any unwellness or disorder supported the given
symptoms. The user interacts with the Prediction Engine by filling a collection of symptoms that holds the parameter set provided
as associate input to the trained models. The Prediction Engine makes use of 3 algorithms to predict the presence of a unwellness
namely: call Tree, Random Forest and Naive Bayes.

The reason to settle on these 3 algorithms are:


1) They effective, if the coaching information is massive.
2) A single dataset is provided as associate input to any or all these three algorithms with bottom or no modification.
3) A common scalar is accustomed normalize the input provided to those three algorithms.

REFERENCES
[1] Kaveeshwar, S.A., and Cornwall, J., 2014, “The current state of unwellness mellitus in India”. AMJ, 7(1), pp. 45-48
[2] Dean, L., McEntyre, J., 2004, “The Genetic Landscape of unwellness [Internet]. Bethesda (MD): National Center for Biotechnology info (US); Chapter one,
Introduction to unwellness. 2004 Jul 7.
[3] Y. Zhang, M. Qiu, C.-W. Tsai, M. M. Hassan and A. Alamri, "HealthCPS: aid cyberphysical system power-assisted by cloud and massive data", IEEE Syst. J,
vol. 11, no. 1, pp. 88-95, Mar. 2017.
[4] Allen Daniel Sunny, Sajal Kulshreshtha, Satyam Singh, Srinabh, Mohan Ba and H Sarojadevi, "Disease identification System By Exploring Machine
Learning Algorithms", International Journal of Innovations in Engineering and Technology (IJIET), vol. 10, no. 2, May

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 420

You might also like