Professional Documents
Culture Documents
SUBMITTED BY
Ankit Sharma(0901AD211006)
Ayush Goyal (0901AD211007)
Ayushi Verma (0901AD211008)
Chandan Jat (0901AD211009)
Devanshi Rathore(0901AD211010)
4th SEMESTER
Artificial Intelligence And Data Science
SUBMITTED TO
Prof. Vibha Tiwari
I hereby declare that the mini skill based project for the course Machine Learning &
Optimization (270404) is being submitted in the partial fulfilment of the requirement for the
award of Bachelor of Technology in Artificial Intelligence And Data Science.
All the information in this document has been obtained and presented in accordance with
academic rule and ethical conduct.
Date : 15-03-2023
Place: Gwalior
Ankit Sharma(0901AD211006)
Ayush Goyal (0901AD211007)
Ayushi Verma (0901AD211008)
Chandan Jat (0901AD211009)
Devanshi Rathore(0901AD211010)
ACKNOWLEDGEMENT
I would like to express my greatest appreciation to all the individuals who have helped and
supported me throughout this lab file. I am thankful to whole Information Technology
department for their ongoing support during the experiments, from initial advice and provision
of contact in the first stages through ongoing advice and encouragement, which led to the finals
report of this lab file.
A special acknowledgement goes to my colleagues who help me in completing the file and by
exchanging interesting ideas to deal with problems and sharing the experience.
I wish to thank our professor Vibha Tiwari as well for her undivided support and interests
which inspired me and encouraged me to go my own way without whom I would be unable to
complete my project.
At the end, I want to thank my friends who displayed appreciation to my work and motivated
me to continue my work
Ankit Sharma(0901AD211006)
Ayush Goyal (0901AD211007)
Ayushi Verma (0901AD211008)
Chandan Jat (0901AD211009)
Devanshi Rathore(0901AD211010)
CHAP 1 – INTRODUCTION
1.1 PROBLEM STATEMENT
Diabetes is a chronic illness which can be caused by body’s inability to produce, or when
body cannot use the insulin that it produces [1]. The effects of diabetes mellitus include
long– term damage, dysfunction and failure of various organs (WHO). As a result, it has
significantly increased mortality in patients. There are mainly two types of diabetes: Type
I (T1) and Type II (T2). T1 occurs when the body is no longer able to produce insulin
whereas T1 is common in childhood and also known as juvenile diabetes. This form of
diabetes is less common; only about 5-10% of people with diabetes have T1 (American
Diabetes Association, 2010). T2 occurs when the body is unable to utilize the insulin
produced or not enough insulin is produced [9, 10 and 11]. In addition, there is another type
of diabetes named gestational diabetes which develops during pregnancy. Too much
glucose in blood can damage eyes, kidneys, and nerves. It can also cause of heart disease,
stroke, and insufficiency in blood flow to legs. Overweight, lack of exercise, family history
and stress increased the possible risk of diabetes 14, 15]. In Bangladesh, people are not
conscious about health. There are 7.1 million case of Diabetes in Bangladesh. The
increasing level of Diabetes is up bound. People do not know about it and they do not go
to check it.
Regression is a supervised learning algorithm in machine learning which is used for
prediction by learning and forming a relationship between present statistical data and target
value i.e., Sale Price in this case. Different factors are taken into consideration while
predicting the worth of the house like location, neighbourhood and various amenities like
garage space etc. if learning is applied to above parameters with target values for a certain
geographical region as different areas differ in price like land price, housing style, material
used, availability of public utilities.
The domain problem of a machine learning binary classification model to predict whether a
person is diabetic or not falls under the umbrella of healthcare and medical informatics.
Diabetes is a chronic condition that affects the way the body processes blood sugar, and it can
lead to serious complications such as heart disease, stroke, and kidney failure if left untreated.
The goal of a binary classification model in this domain is to accurately predict whether a
person has diabetes based on their medical history, physical examination, and other relevant
factors. The model would be trained on a dataset of individuals with and without diabetes, and
it would use features such as age, body mass index (BMI), blood pressure, and blood glucose
levels to make its predictions.
The development of a binary classification model for diabetes diagnosis is important because
it can help healthcare providers to make more accurate and timely diagnoses, which in turn can
lead to better treatment outcomes and improved quality of life for patients. Additionally, such
models can help identify high-risk individuals who may benefit from preventive interventions
or lifestyle modifications to reduce their risk of developing diabetes.
Goal of the paper is to investigate for model to predict diabetes with better accuracy. We
experimented with different classification and ensemble algorithms to predict diabetes. In the
following, we briefly discuss the phase. In this project we are going to use different types of
algorithms which uses their own mathematical equation on background. This project comes
with the data collected from different samples world-wide from which we will separate our
training and testing data. Initially data cleaning & pre-processing perform over data. Ordinal
encoding is performed to convert the different range of Quality data into two categories. In
model building Final model is select based on evaluation benchmark among different models
with different algorithms. Different Graphs, plots are also plotted for better understanding of
the data.
2. 2 Data Sources and their formats
The data is gathered from UCI repository which is named as Pima Indian Diabetes Dataset.
The dataset have many attributes of 768 patient. The 9th attribute is class variable of each data
points. This class variable shows the outcome 0 and 1 for diabetics which indicates positive or
negative for diabetics.
Tail:
2.4.2 Count
It counts the no of rows for each columns
2.4.3 Describe
2.4.4 Nunique
Hardware Used -
1. Processor — Intel i5 processor
2. RAM—8GB
Software utilised –
So we will not remove the outliers and just continue woth them.
Chap 4- Conclusion
The main aim of this project was to design and implement Diabetes Prediction Using Machine
Learning Methods and Performance Analysis of that methods and it has been achieved
successfully. The proposed approach uses various classification and ensemble learning method
in which Boxplot, Distplot, Confusion Matrix, Accuracy Score, ROC curve, AUC Score,
Logistic Regression and Gradient Boosting classifiers are used. And 0.77% classification
accuracy has been achieved. The Experimental results can be asst health care to take early
prediction and make early decision to cure diabetes and save humans life.