You are on page 1of 23

CLASSIFICATION OF DIABETES USING MACHINE LEARNING ALGORITHM

CHAPTER 1
INTRODUCTION
Diabetes is a chronic disease that occurs either when the pancreas does not
produce enough insulin or when the body cannot effectively use insulin.
Insulin is a hormone that controls the level of blood glucose which is used as
an energy source to meet the needs of organs and tissues. Insulin helps our
body take advantage of nutrients, especially carbohydrates. When this process
is deficient, these nutrients, in addition to lipids and fats, cannot accomplish
their goal, which is to be transformed into energy. With this abnormality, the
glucose accumulates in the blood causing a significant deterioration in human
health. According to the WHO, in 2014, 422 million people in the world had
diabetes, with a prevalence of 8.5% among the adult population. In Latin
America, in 2003, the number of cases was estimated at 19 million adults (20
to 79 years old) affecting 10% to 15% of the adult population; this suggests
that there could be at least 33 million by the year 2030. This assessment is
based only on demographic changes such as population age and urbanization
and represents an enormous problem for older adults and a public burden.

There are two types of diabetes. Type 1 that can appear at any age and
is diagnosed more frequently in children, teenagers, and young adults. It is
caused by a loss or alteration of insulin-producing cells, called pancreatic beta
cells. The suspicion of type 1 DM is based on the presentation of acute
symptoms such as weight loss, major general affectation, ketosis, and
hyperglycaemia. The absence or insufficient production of insulin by the body
is the result of the lack of control of the beta cells. Type 2DM appears as a
result of a defect in insulin secretion in an environment of insulin resistance. In
the case of diagnosis in elderly persons, the guidelines provided by the
International Diabetes Federation are an excellent reference to consider which
suggests a complete assessment and timely follow up that must be carried out
according to a category dependency. The role of the caregiver is also defined.

1.1 METHODS TO MEASURE GLUCOSE:

The main concern of a person with diabetes is to control the blood


glucose parameter in order to avoid abnormalities in glucose boundaries.

DEPT.ECE BWEC Page | 1


CLASSIFICATION OF DIABETES USING MACHINE LEARNING ALGORITHM

Methods for monitoring these levels are classified into three categories,
invasive, minimally invasive, and non-invasive described as follows:

➢ The invasive category includes techniques that are mostly used, because it
offers the greatest precision of results due to direct contact with the patient’s
blood. The traditional procedure is to prick a finger, which is painful for the
patient. The measurements must be carried out under armigerous cleaning
regimen, since infections can occur.

➢ The minimally invasive category includes one of the techniques under


investigation. This technique makes use of micropores, which are small holes
on the skin caused by laser radiation. When the pores are open, a device
applies a continuous vacuum pressure that extracts a small amount of
transdermal body fluid. With the obtained sample, an enzyme-based electrode
processes and measures glucose levels.

➢ The non-invasive category includes non-invasive glucometers in which the


objective is to replace the blood variable with another to which access is
obtained by external means. Its operation consists of placing a sensor on a
specific region of the body to obtain a reading of the glucose. Their ability of
the values is almost 100%, because experts have been assured that the amount
of glucose in the blood is equal to the amount found on the skin of a patient.
Biofluids such as saliva, urine, sweat, or tears have been studied as non-
invasive glucose tests, but with these, it is not possible to continuously track
the glucose levels.

IoT is making any objects internally connected in the recent decade and
it has been considered as the next technological revolution. Smart health
monitoring mechanism, smart parking, smart home, smart city, smart climate,
industrial sites, and agricultural fields are some of the applications of IoT. The
most tremendous use of IoT is in healthcare management which provides
health and environment condition tracking facilities. IoT is nothing but linking
computers to the internet utilizing sensors and networks. These connected
components can be used on devices for health monitoring.

DEPT.ECE BWEC Page | 2


CLASSIFICATION OF DIABETES USING MACHINE LEARNING ALGORITHM

CHAPTER 2

LITERATURE REVIEW
 Michael O. Olusanya, Ropo Ebenezer Ogunsakin, Meenu Ghai and
Matthew Adekunle Adeleke, “Classification Models for the Prediction of
Type 2 Diabetes”, Int. J. Environ. Res. Public Health 2022, 19, 14280,
International Journal of Environmental Research and Public Health,
Vol.19, Issue 21,1st November ,2022.PP:1-19. Decision tree model has the
best prediction performances, with excellent accuracy compared to other
soft-computing models in this systematic meta-analysis.
 Henock M. Deberneh and Intaek Kim, “Prediction of Type 2 Diabetes
Based on Machine Learning Algorithm”, Int. J. Environ. Res. Public
Health 2021, 18, 3317, International Journal of Environmental Research
and Public Health, Vol.18, 23 March 2021.PP:1-14. The model can
provide both clinicians and patients with valuable information on the
incidence of T2D ahead of time, which would help patients
 G.A .Pethunachiyar, “Classification of Diabetes Patients Using Kernel
Based Support Vector Machines” 2020 International Conference on
Computer Communication and Informatics (ICCCI -2020), Jan. 22-24,
2020, Coimbatore, INDIA. The SVM with Linear Kernel function
produced 100%, SVM with Radial Kernel Produced 99%.
 Madhusmita Rout, Amandeep Kaur, “Prediction of Diabetes Risk based on
Machine Learning Techniques”, 2020 International Conference on
Intelligent Engineering and Management (ICIEM). The integration of
classification algorithms and clustering with other technologies such as
IoT, Cloud computing, etc... helps build intelligent systems and monitoring
tools.
 Aeshah Saad Alanazi Mohd A. Mezher, “Using Machine Learning
Algorithms For Prediction Of Diabetes Mellitus”, 2020 International
Conference on Computing and Information Technology. The obtained
results indicate that the RF algorithm has a prediction accuracy of 98%
with a precision of 100%.

DEPT.ECE BWEC Page | 3


CLASSIFICATION OF DIABETES USING MACHINE LEARNING ALGORITHM

 K. Vijaya Kumar, B. Lavanya, I. Nirmala, S. Sofia Caroline, “Random


Forest Algorithm for the Prediction of Diabetes” Proceeding of
International Conference on Systems Computation Automation and
Networking. To develop a system which can perform early prediction of
diabetes for a patient with a higher accuracy.
 D. Vigneswari, N. Komal Kumar, V. Ganesh Raj, A. Gugan, S. R. Vikash,
“Machine Learning Tree Classifiers in Predicting Diabetes Mellitus”, 2019
5th International Conference on Advanced Computing & Communication
Systems (ICACCS). Logistic Model Tree (LMT) classifier achieved an
accuracy of 79.31% with an average time of 0.49 sec to build the model
which is higher than the Random Forest tree classifier with 78.54%
accuracy and 0.04 sec.
 S M Hasan Mahmud, Md Altab Hossin , Md. Razu Ahmed, “Machine
Learning Based Unified Framework for Diabetes Prediction”, Proceedings
of the 2018 International Conference on Big Data Engineering and
Technology,2018. Artificial Neural Network (ANN),Support Vector
Machine (SVM),Logistics Regression11(LR),Decision Tree (DT),Random
Forest (RF),Naive Bayes (NB).
 Rahul Joshi& Dr. Preethi Mulay, “Analysis and Prediction of Diabetes
Mellitus using Machine Learning Algorithm”, 2018,International Journal
of Pure and Applied Mathematics. In this study the proposed method
provide high accuracy with accuracy value of 90.36% and decision Stump
provided less accuracy than other by providing 83.72% accuracy.
 Samrat Kumar Dey , Ashraf Hossain , Md. Mahbubur Rahman,
“Implementation of a Web Application to Predict Diabetes Disease: An
Approach Using Machine Learning Algorithm”, 2018 21st International
Conference of Computer and Information Technology (ICCIT). From
different machine learning algorithms Artificial Neural Network (ANN)
provide us highest accuracy with Min Max Scaling Method on Indian Pima
Dataset.
 Sofia Benbelkacem, Baghdad Atmani, “Random Forests for Diabetes
Diagnosis”, 2019 International Conference on Computer and Information
Sciences (ICCIS). That good results have been obtained with random
forests on the Pima basis.

DEPT.ECE BWEC Page | 4


CLASSIFICATION OF DIABETES USING MACHINE LEARNING ALGORITHM

 Ayman Mir, Sudhir N. Dhage, “Diabetes Disease Prediction using


Machine Learning on Big Data of Healthcare”, 2018 Fourth International
Conference on Computing Communication Control and Automation
(ICCUBEA). The overall performance of Support Vector machine to
predict the diabetes disease is better than Naive Bayes, Random Forest
 Raid M. Khalil, Adel Al-Jumaily, “Machine Learning Based Prediction of
Depression among Type 2 Diabetic Patients”, 2017 12th International
Conference on Intelligent Systems and Knowledge Engineering (ISKE). It
is clear from the results that the SVM classifier generates more precise
results than the others.
 Amit kumar Dewangan, Pragati Agrawal, “Classification of Diabetes
Mellitus Using Machine Learning Techniques”, 2015, International Journal
of Engineering and Applied Sciences (IJEAS). This model has achieved
the highest Accuracy of 81.89% with 6 features and it achieves the highest
Sensitivity of 64.10% and the highest Specificity of 90.90%.
 Anjali C ,Veena Vijayan V, “Prediction and Diagnosis of Diabetes
Mellitus -A Machine Learning Approach”, 2015 IEEE Recent Advances in
Intelligent Computational Systems (RAICS). The computer information
system with Ada-Boost classifier provides an accuracy of 80.729 diabetes
with a very low value of error rate.
 V. Anuja Kumari ,R. Chitra, “Classification Of Diabetes Disease Using
Support Vector Machine”, 2013,International Journal of Engineering
Research and Applications. SVM approach can be successfully used to
detect a common disease with simple clinical measurements, without
laboratory tests.
 Naveen Kumar Shrivastava, Praneth Saurabh and Bhupendra Verma, “An
Efficient Approach Parallel Support Vector Machine for Classification of
Diabetes Dataset”, 2011,International Journal of Computer Applications.
The proposed method also maintained the approximately same accuracy
when compared with normal SVM, although it shows that dividing data
into larger number of clusters decreases the accuracy but it could be
controlled by selecting proper starting point for K means clustering.
 Aishwarya Mujumdara Dr. Vaidehi V, “Diabetes Prediction using
Machine Learning Algorithms”, 2019,International Conference On Recent

DEPT.ECE BWEC Page | 5


CLASSIFICATION OF DIABETES USING MACHINE LEARNING ALGORITHM

Trends In Advanced Computing. The classification has been done using


various algorithms of which Logistic Regression gives highest accuracy of
96%. Application of pipeline gave AdaBoost classifier as best model with
accuracy of 98.8%.
 Pratap Kumar Saha, Nazmus Sakib Patwary, Ifthakhar Ahmed, “A
Widespread Study of Diabetes Prediction Using Several Machine Learning
Techniques”, 2019 22nd International Conference on Computer and
Information Technology (ICCIT). Neural Network was given best
accuracy (80.4%) than any other methods. Naive Bayes was taken less
execution time than any other methods.
 Debadri Dutta,Debpriyo Paul,Parthajeet Ghosh, “Analysing Feature
Importances for Diabetes Prediction using Machine Learning” , 2018 IEEE
9th Annual Information Technology, Electronics and Mobile
Communication Conference (IEMCON) . Random Forest is the most ideal
algorithm for predicting Diabetes, which gives an accuracy of around 84%.

DEPT.ECE BWEC Page | 6


CLASSIFICATION OF DIABETES USING MACHINE LEARNING ALGORITHM

CHAPTER 3
3.1 EXISTING METHOD:

Several researchers have made many contributions to predict diabetes.


Diabetes is a most expensive chronic disease. The fact that majority of
diabetes patients are asymptomatic which leads to delayed standard clinical
laboratory examinations. They looked at machine learning algorithms which
helps to predict diabetes. The algorithms they used were Artificial neural
networks, Logistic regression , Support Vector Machine , Naive Bayes
etc.They applied various machine learning algorithms to predict diabetes and
to get better accuracy. They compare all the algorithms to identify that which
algorithm produces better accuracy than other algorithms. In recent years, the
healthcare industry has shown rapid growth and has been a major contributor
to revenue and employment . A few years ago, the diagnosis of diseases and
abnormality in the human body was only being possible after having a
physical analysis in the hospital. Most of the patients had to stay in the hospital
throughout their treatment period. This resulted in an increased healthcare cost
and also strained the healthcare facility at rural and remote locations. The
technological advancement that has been achieved through these years has
now allowed the diagnosis of various diseases and health monitoring using
miniaturized devices like smart watches. Moreover, technology has
transformed a hospital centric healthcare system into a patient-centric system .
For example, several clinical analyses such as measuring blood pressure, blood
glucose level, spo2 level, and so on can be performed at home without the help
of a healthcare professional. Further, the clinical data can be communicated to
healthcare centers from remote areas with the help of advanced
telecommunication services. The use of such communication services in
conjunction with the rapidly growing technologies (e.g., machine learning, big
data analysis, Internet of things (IoT), wireless sensing, mobile computing, and
cloud computing) has improved the accessibility of the healthcare facilities.
The existing works are focused on big data and predictive analytics in

DEPT.ECE BWEC Page | 7


CLASSIFICATION OF DIABETES USING MACHINE LEARNING ALGORITHM

healthcare that use classification to predict possible episodes of rises or falls in


the blood sugar level.

Classification in e-health monitoring plays a vital role in the further


treatment of the disease In a hospital, either the nurse or the doctor has to
move physically from one person to another for health check, which may not
be possible to monitor their conditions continuously. Thus, any critical
situations cannot be found easily unless the nurse or doctor checks the
person’s health at that moment. This may be a strain for the doctors who have
to take care of a lot number of people in the hospital. Also, when medical
emergencies happen to the patient, they are often unconscious and unable to
press an Emergency Alert Button

3.2 PROPOSED METHOD:


In previous technologies so many authors were used various machine learning
algorithms to predict diabetes. But now, in this project we are using IR sensor
for testing the sugar level of patients and classify the types of diabetes with the
help of Random Forest algorithm. We are giving the alert for type-2 diabetes
using Light blinking (or) Buzzers. Then we are sending messages to doctors as
well as patients who are suffering with either Type-1 or Type-2 or Gestational
Diabetes for further medication process.
3.3 BLOCK DIAGRAM:

Fig 3.7: Block Diagram of proposed System

DEPT.ECE BWEC Page | 8


CLASSIFICATION OF DIABETES USING MACHINE LEARNING ALGORITHM

CHAPTER 4

HARDWARE COMPONENTS
4.1 Raspberry-pi 3B Model :

The Raspberry Pi is a credit card-sized computer. The Raspberry Pi 3


Model B+ is an improved version of the Raspberry Pi 3 Model B. It is based
on the BCM2837B0 system-on-chip (SoC), which includes a 1.4 GHz quad-
core ARMv8 64bit processor and a powerful video core IV GPU.

Fig 4.1: Raspberry-pi 3B Model

The Raspberry Pi is a credit card-sized computer with an ARM processor that


can run Linux. This item is the Raspberry Pi 3B Model, which has 1 GB of
RAM, dual-band WiFi, Bluetooth 4.2, Bluetooth Low Energy (BLE), an
Ethernet port, HDMI output, audio output, RCA composite video output
(through the 3.5 mm jack), four USB ports, and 0.1″-spaced pins that provide
access to general purpose inputs and outputs (GPIO). The Raspberry Pi
requires a microSD card with an operating system on it (not included).

4.2 Features of Raspberry-pi 3B Model :

 Broadcom BCM2837B0, Cortex-A53 (ARMv8) 64-bit SoC @ 1.4GHz.


 1GB LPDDR2 SDRAM.

DEPT.ECE BWEC Page | 9


CLASSIFICATION OF DIABETES USING MACHINE LEARNING ALGORITHM

 2.4GHz and 5GHz

 Extended 40-pin GPIO header.


 Full-size HDMI.
 USB 2.0 ports.

4.3 InfraRed(IR) Sensor:

Sensors are the electric device which is used to sense the changes that occur

in the body and in the environment. The change includes the pigmentation

colour, temperature, humidity, sound etc. They sense the changes that occurred

and notify accordingly. In IR sensor, there is an emitter and detector. Emitter

emits the IR rays and detector detects it. The IR sensor basically consists of

three components for its function it includes,

 IR LED (emitter)

 Photodiode (detector)

 Op-Amp

Fig 4.2:InfraRed Sensor for non-invasive glucose level sensing

DEPT.ECE BWEC Page | 10


CLASSIFICATION OF DIABETES USING MACHINE LEARNING ALGORITHM

IR Led:
It is the light emitting diode which emits IR radiations. The function of
the emitter is to convert the electricity energy into light energy. It works by the
principle of recombination of the electron-hole pair.

Fig 4.3 IR Led

Photodiode:

It is a p-n junction diode, connected in reverse bias direction. The


function of this detector is to convert the light into electricity energy. It works
effectively only when the certain amount of light or photon falls on it. If there
is no falling of light on the photodiode then it has an infinite resistance and it
act as an open switch.

Fig 4.4 :Photo Diode

Op-Amp:

Op-Amp Operational amplifier is the simplified form of Op-Amp. It


performs many operations such as addition, subtraction, multiplication,
division, etc. The Op-Amp is a DC-coupled high gain amplifier with two
inputs and single output.

DEPT.ECE BWEC Page | 11


CLASSIFICATION OF DIABETES USING MACHINE LEARNING ALGORITHM

4.4 Features of IR Sensor :

• The operating voltage is 5VDC

• I/O pins – 3.3V & 5V

• Mounting hole

• The range is up to 20 centimetres

• The supply current is 20mA

• The range of sensing is adjustable

• Fixed ambient light sensor

4.5 GSM Module:

GSM module is used to establish communication between a computer


and a GSM system, Global System for Mobile communication (GSM) is an
architecture used for mobile communication in most of the countries.

GSM that enables higher data transmission rate. GSM module consists
of a GSM modem assembled together with power supply circuit and
communication interfaces (like RS-232, USB etc.) for a computer. GSM
MODEM is a class of Also, they have IMEI (International Mobile Equipment
Identity) number similar to mobile phones for their identification. A GSM
MODEM can perform the following operations:

1. Receive, send or delete SMS messages in a SIM.

2. Read, add, search phonebook entries of the SIM. 3. Make. Receive, or reject
a voice call.

Fig 4.3: GSM Module

DEPT.ECE BWEC Page | 12


CLASSIFICATION OF DIABETES USING MACHINE LEARNING ALGORITHM

4.6 Features of GSM module:

• MediaTek MT3329 Chipset, L1 Frequency, C/A code, 66 Channels

• 3m position accuracy

• Jammer detection and reduction

• Data output Baud rate: 9600 bps (Default)

• Low Power Consumption: 55mA @ acquisition, 40mA @ tracking

• High Sensitivity, -165 dBm, TCXO design, superior urban performances

• Patch antenna

• High sensitivity

• DGPS(WAAS/EGNOS/MSAS/GAGAN) support.

4.7 LCD Display :

A Liquid-Crystal Display (LCD) is a flat-panel display or other electronically


modulated optical device that uses the light-modulating properties of liquid
crystals combined with polarizers. Liquid crystals do not emit light directly but
instead use a backlight or reflector to produce images in colour
or monochrome. LCDs are available to display arbitrary images (as in a
general-purpose computer display) or fixed images with low information
content, which can be displayed or hidden.

Fig 4.4: LCD Display

DEPT.ECE BWEC Page | 13


CLASSIFICATION OF DIABETES USING MACHINE LEARNING ALGORITHM

LCDs are used in a wide range of applications, including LCD


televisions, computer monitors, instrument panels, aircraft cockpit displays,
and indoor and outdoor signage. Small LCD screens are common in LCD
projectors and portable consumer devices such as digital
cameras, watches, calculators, and mobile telephones, including smartphones.

4.7 Features of LCD display:


 Cost effective
 Energy efficient
 Space economy
 Reduced radiation
 Lighter weight
 Less eyestrain
 Improved image quality/contrast
 Better screen privacy
 Long life

4.8 Relay :

Relays are electrically operated switches that open and close the
circuits by receiving electrical signals from outside sources. Some people may
associate “relay” with a racing competition where members of the team take
turns passing batons to complete the race.The “relays” embedded in electrical
products work in a similar way, they receive an electrical signal and send the
signal to other equipment by turning the switch on and off.

Fig 4.5: Relay Switch

DEPT.ECE BWEC Page | 14


CLASSIFICATION OF DIABETES USING MACHINE LEARNING ALGORITHM

Relays can be of different types like electromechanical, solid state.


Electromechanical relays are frequently used. Let us see the internal parts of
this relay before knowing about it working. Although many different types of
relay were present, their working is same. Every electromechanical relay
consists of an consists of
 Electromagnet
 Mechanically movable contact
 Switching points and
 Spring
Electromagnet is constructed by wounding a copper coil on a metal core.
The two ends of the coil are connected to two pins of the relay as shown.
These two are used as DC supply pins. Generally two more contacts will be
present, called as switching points to connect high ampere load. Another
contact called common contact is present in order to connect the switching
points. We can use a Relay either in a AC circuit or a DC Circuit. In case of
AC relays, for every current zero position, the relay coil gets demagnetized
and hence there would be a chance of continues breaking of the circuit.

4.10 Features of relay:


 Lighted Indicator
 Mechanical Indicator.
 Test Button.
 Resistor.
 Diode, Surge Protection, Varistor and RC Circuit.
 Debounce Delay.
 Magnetic Blowout.

DEPT.ECE BWEC Page | 15


CLASSIFICATION OF DIABETES USING MACHINE LEARNING ALGORITHM

4.11 Buzzer:
An audio signaling device like a beeper or buzzer may be electromechanical
or piezoelectric or mechanical type. The main function of this is to convert the
signal from audio to sound. It is a output device.

Fig 4.6: Buzzer


4.12 Features of Buzzer:

 Colour is black.
 The frequency range is 3,300Hz.
 Operating Temperature ranges from – 20° C to +60°C.
 Operating voltage ranges from 3V to 24V DC.
 The sound pressure level is 85dBA or 10cm.
 The supply current is below 15mA.

4.13 Software Requirements:


We are using machine learning algorithms.
1.Trend identification

Machine Learning can analyse huge amounts of data and identify particular
trends and patterns not immediately visible to the human eye.
For example, Netflix or Amazon Prime use machine learning techniques to
understand their users’ browsing habits and watch histories in order to provide
them with personalized recommendations.

2. Continual learning

A machine learning model is only as accurate as the data it is provided with.


As algorithms acquire experience, their accuracy and efficiency improve.

DEPT.ECE BWEC Page | 16


CLASSIFICATION OF DIABETES USING MACHINE LEARNING ALGORITHM

As the amount of data grows, algorithms learn to generate more accurate


predictions in a shorter amount of time.

3. Data management

Machine Learning algorithms excel at dealing with multi-dimensional and


multi-variety data. They can do so in dynamic (eg: data does not follow a
specific format) or unpredictable situations.

4. Automation

Machine learning does not need constant human attention or supervision. A


programmer would only need to set up the model to be trained and provide
data for training testing and validation. At worst, intervention would involve
diagnosing and remedying errors. This is to say, once the model is set up, there
is no need for human intervention.

5. Applicability

ML has the unique advantage of being able to be applied in almost any field.
Once applied, it makes everything easier, from identifying the target user base
to providing intensive reports, a seamless workflow can be created.

4.14 Types of Machine Learning Algorithms:

1. Decision Tree (DT)


2. Naive Bayes (NB)
3. Linear Regression (LR)
4. Logistic Regression (LR)
5. Support Vector Machine (SVM)
6. K-Nearest Neighbour (KNN)
7. K-Means Clustering

DEPT.ECE BWEC Page | 17


CLASSIFICATION OF DIABETES USING MACHINE LEARNING ALGORITHM

4.14.1 Decision Tree (DT):

The DT Entropy is generated as follows: A node k is taken and  j class


labels are identified. The value of j ranges from 1 to j . It is given
mathematically as follows:
J
Entropy(k) = -∑ p(j/k)log2 (j/k) (4.1)
j =1
LGBM can be used in 2 methods, namely, GBDT (Gradient Boosting
Decision Tree) and GOSS (Gradient based one-sided sampling). Tree wise
method is used to provide the best fit, and other boosting algorithms use the
depth wise method to divide. It provides better results when compared with the
other existing boosting algorithms .

4.14.2 Naive_Bayes (NB) :

NB is one of the classification methods that uses conditional probability


values to divide the data using the algorithm. It is also used to detect the
behavior of the various patients involved. It is majorly used to implement large
dataset. It is a collaborative classification model involving Logistic Regression
for patients data classification into different groups. It is good for predictions
involving real time, multiclass, recommendation system, text based
classification, and sentiment analysis .

The Bayesian Formula for calculating the Naive Bayes Algorithm is as


follows:
Where Pr(A | B) = Posterior Probability, Pr(B | A) = Likelihood Probability,
Pr(A) = Class_Prior_Probability, Pr(B) = Predictor_Prior_Probability

(4.2)

Many Machine Learning methods and techniques can be tested and used along
with classifiers for Diabetes disease Prediction. However, for the dataset used,

DEPT.ECE BWEC Page | 18


CLASSIFICATION OF DIABETES USING MACHINE LEARNING ALGORITHM

the best suited classifiers are considered Gradient Boosting Classifiers (GBM,
LGBM, and XGB) from Table …. and Decision Tree based on the Simulation
mechanism earlier used. However, other classifiers such as Random Forest,
Naive Bayes, and Support Vector Machine are also considered for final
accuracy percentage analysis .

 4.14.3 Linear Regression :

To understand the working functionality of Linear Regression, imagine


how you would arrange random logs of wood in increasing order of their
weight. There is a catch; however – you cannot weigh each log. You have to
guess its weight just by looking at the height and girth of the log (visual
analysis) and arranging them using a combination of these visible parameters.
This is what linear regression in machine learning is like.

In this process, a relationship is established between independent and


dependent variables by fitting them to a line. This line is known as the
regression line and is represented by a linear equation Y= a *X + b.

 4.14.4 Logistic Regression :

Logistic Regression is used to estimate discrete values (usually binary values


like 0/1) from a set of independent variables. It helps predict the probability of
an event by fitting data to a logit function. It is also called logit regression.

These methods listed below are often used to help improve logistic regression
models:

 include interaction terms

 eliminate features

 regularize techniques

 use a non-linear model

DEPT.ECE BWEC Page | 19


CLASSIFICATION OF DIABETES USING MACHINE LEARNING ALGORITHM

4.14.5 SVM (Support Vector Machine) Algorithm :

SVM algorithm is a method of a classification algorithm in which you


plot raw data as points in an n-dimensional space (where n is the number of
features you have). The value of each feature is then tied to a particular
coordinate, making it easy to classify the data. Lines called classifiers can be
used to split the data and plot them on a graph.

4.14.6 KNN (K- Nearest Neighbours) Algorithm :

This algorithm can be applied to both classification and regression problems.


Apparently, within the Data Science industry, it's more widely used to solve
classification problems. It’s a simple algorithm that stores all available cases
and classifies any new cases by taking a majority vote of its k neighbors. The
case is then assigned to the class with which it has the most in common. A
distance function performs this measurement.

KNN can be easily understood by comparing it to real life. For example, if you
want information about a person, it makes sense to talk to his or her friends
and colleagues!

Things to consider before selecting K Nearest Neighbours Algorithm: 

 KNN is computationally expensive

 Variables should be normalized, or else higher range variables can bias the
algorithm

 Data still needs to be pre-processed.

4.14.7 K-Means Clustering :

It is an unsupervised learning algorithm that solves clustering problems. Data


sets are classified into a particular number of clusters (let's call that number K)
in such a way that all the data points within a cluster are homogenous and
heterogeneous from the data in other clusters.

DEPT.ECE BWEC Page | 20


CLASSIFICATION OF DIABETES USING MACHINE LEARNING ALGORITHM

How K-means forms clusters:

 The K-means algorithm picks k number of points, called centroids, for


each cluster.

 Each data point forms a cluster with the closest centroids, i.e., K clusters.

 It now creates new centroids based on the existing cluster members.

 With these new centroids, the closest distance for each data point is
determined. This process is repeated until the centroids do not change.

4.15 Random Forest Algorithm:

A Random Forest Algorithm is a supervised machine learning algorithm


that is extremely popular and is used for Classification and Regression
problems in Machine Learning.

We know that a forest comprises numerous trees, and the more trees
more it will be robust. Similarly, the greater the number of trees in a Random
Forest Algorithm, the higher its accuracy and problem-solving ability.

Random Forest is a classifier that contains several decision trees on


various subsets of the given dataset and takes the average to improve the
predictive accuracy of that dataset. It is based on the concept of ensemble
learning which is a process of combining multiple classifiers to solve a
complex problem and improve the performance of the model.

DEPT.ECE BWEC Page | 21


CLASSIFICATION OF DIABETES USING MACHINE LEARNING ALGORITHM

Traini
Training Training
………… …….. ng
data 1 data 2
data n

Training set

.. Decision Decision ……. Decisi


tree 1 tree 2 -on
tree n

Voting
(Average)

Testing set

Prediction

Fig 4.7 Working of Random Forest Algorithm

4.16 Features of a Random Forest Algorithm:

 It’s more accurate than the decision tree algorithm.


 It provides an effective way of handling missing data.
 It can produce a reasonable prediction without hyper-parameter tuning.
 It solves the issue of overfitting in decision trees.
 In every random forest tree, a subset of features is selected randomly at the
node’s splitting point.

DEPT.ECE BWEC Page | 22


CLASSIFICATION OF DIABETES USING MACHINE LEARNING ALGORITHM

CHAPTER 6
6.1 Conclusion
In this paper healthcare monitoring system for diabetic patients is
proposed which monitors sugar level. It helps doctors and patient relatives to
monitor and store the glucose level of patients. it gives alert to doctors and
patient relatives when the patient is in type-2 diabetes level. Using Internet,
data can be made available for remote use and only to authorized users like
remote specialist doctors for special advice. Thus designing parameters like
availability, security, correctness and efficiency are achieved successfully.
Nearly 95% correct results are achieved when compared to standard clinical
methods and commercial instruments like CGM’s. Thus the system can found
helpful for the monitoring the sugar level of diabetic patients. The use of this
system can be extended to care and monitor elderly people staying all alone at
their homes and also for caring of parents.

6.2 Future Scope

In future, the communication can be made collaborative by adding two


way communication protocols for IoT so that doctors can monitor and advice
patients online. Similarly, patients can ask there queries to remote doctors as
well and also includes the details of patient such as Age ,Time, Date and
Gender. Further comparing Machine Learning Algorithms with Deep Learning
Algorithms to achieve 100% accuracy in the system.

DEPT.ECE BWEC Page | 23

You might also like