Professional Documents
Culture Documents
OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
DONE BY :
PROJECT GUIDE :
BATCH NO : A2
1. Dikshith Roshan T (18TD0618) Dr. N. POONGUZHALI
2. Haritha R (18TD0629) ASSOCIATE PROFESSOR
3. Ranjani R (18TD0664) DEPARTMENT OF CSE
(IV year CSE -A )
Date : 07/07/2022
1
AGENDA
• ABSTRACT
• INTRODUCTION
• PROBLEM DEFINITION
• OBJECTIVE
• LITERATURE SURVEY
• EXISTING SYSTEM
• DRAWBACKS OF EXISTING SYSTEM
• PROPOSED SYSTEM
• DIABETIC INSIGHT AND DELICIOUS DIET IN MEDCHAIN(DIDDM)
• MODULE I - DIABETIC INSIGHT
• MODULE II - DIABETIC DIET PLAN
• MODULE III - MEDCHAIN
• DATASET DESCRIPTION
• RESULT ANALSIS
• CONCLUSION
• REFERENCES 2
ABSTRACT
• In the modern world, diabetes has risen to the top of the most
prevalent diseases. Machine learning is used to determine whether a
patient has diabetes. We employ the Random Forest Algorithm to get
better prediction results in order to increase accuracy. By prescribing
a food strategy based on a patient's age and BMI categories, we also
hope to keep their diabetes under control. All of the patient-provided
data is kept in an electronic health record (EHR) and protected by the
Interplanetary File System (IPFS), a private blockchain used to store
and access patient data.
3
INTRODUCTION
4
INTRODUCTION
• Diabetes is a disease that occurs when your blood glucose is too high.
• Insulin, a hormone made by the pancreas, helps glucose from food get into your
cells to be used for energy.
• Sometimes your body doesn’t make enough insulin or doesn’t use insulin
well. Over time, having too much glucose in your blood can cause health
problems. This condition is known as Diabetics
• Although diabetes has no cure, you can take steps to manage your diabeted and
stay healthy.
Causes of diabetics:
• Obesity and an inactive lifestyle are two of the most common causes of type 2
diabetes. These things are responsible for about 90% to 95% of diabetes.
5
MACHINE LEARNING
• Machine learning (ML) is the study of computer algorithms that can
improve automatically through experience and by the use of data.
• Random forest is a Supervised Machine Learning Algorithm that
is used widely in Classification and Regression problems. It builds
decision trees on different samples and takes their majority vote for
classification and average in case of regression.
6
ELECTRONIC HEALTH RECORD
• An Electronic Health Record is a digital version of a patients
paper chart. EHRs are real time , patient-centred records that make
information available instantly and securely to the authorized user.
EHR
7
BLOCKCHAIN
• Blockchain is a collection of records linked with each
other strongly, resistant to alteration. It uses peer to peer network
for sharing information.
BLOCK II
BLOCK IV
8
PROBLEM DEFINITION
• To develop a diabetic prediction system using machine learning
technique and provide a diet plan based on severity of diabeticsThe
patient medical records (EHR) is secured using private blockchain and
access only to the authorized users.
9
OBJECTIVE
• By gathering the patient's medical information, including Name, Age,
Insulin, Glucose Level, and BMI, and using Machine Learning's
Random Forest Algorithm to predict diabetic status for the patient
• To suggest a patient's diet based on medical data obtained during the
diagnosis of diabetes and preparation of a diet plan
• By implementing the InterPlanetry File System private blockchain to
secure the EHR that contains the patient's medical information (IPFS)
10
LITERATURE SURVEY
11
LITERATURE SURVEY
12
EXISTING SYSTEM
• In the existing system they have used machine learning methodology
to detect persons with increased type 2 diabetes or prediabetes risk
among people without known abnormal glucose regulation. The
parameters used in previous system were BMI, Waist – hip ratio, Age,
systolic and diastolic blood pressure and diabetes hybridity
13
DRAWBACKS OF EXISTING SYSTEM
14
PROPOSED SYSTEM
• To develop a prediction system using machine learning model for
diabetics disease using random forest algorithm. The system also
recommends diet plan for the patient based on insulin level, age and
BMI. The patient medical records (EHR) is secured using private
blockchain and access only to the authorized users.
15
DIABETIC INSIGHT AND DELICIOUS DIET IN MEDCHAIN(DIDDM)
DIABETIC INSIGHT MEDCHAIN
RANDOM
DATA
FOREST
CLEANING EHR
ALGORITHM YES
INPUT VALIDATOR
DATA
NO
VISUALIZATION DENIED
PATIENT MEDICAL
DETAILS
BLOCK I TRANSACTION
DIABETIC DIET PLAN PROCESS
CONDITION CLAUSE
EHR
PREDICTION OUTPUT BLOCK II BLOCK III
AGE
BLOCK IV
BMI RANGE
16
MODULE I - DIABETIC
INSIGHT
PREDICTION OF DIABETICS USING RANDOM FOREST ALGORITHM
17
PREDICTION MODULE DIAGRAM
RANDOM
DATA FOREST
CLEANING ALGORITHM
INPUT OUTPUT
DATA
VISUALIZATION
18
PREDICTION MODULE
INPUT :
• Dataset of diabetics patients is the input for our project.
• Datasets are a collection of instances that all share a common attribute.
• In Machine Learning projects, we need a training data set to train our model.
• The more data you provide to the ML system, the faster that model can learn
and improve.
PARAMETERS IN DATASET
• Glucose Level • Age
• Diabetes Pedigree • Pregnancies
Function • Insulin Level
• Blood Pressure • BMI
• Skin Thickness
19
PREDICTION MODULE
DATA CLEANING :
The main aim of Data Cleaning is to identify and remove errors &
duplicate data, in order to create a reliable dataset.
20
PREDICTION MODULE
DATA VISUALIZATION :
• Data visualization is the graphical representation of information and
data.
• Data visualization helps to analyze data quickly and efficiently.
• It is important to understand how data is used in a particular Machine
Learning model it helps in analyzing it.
DATA
ERROR FREE DATA
VISUALIZATION
21
PREDICTION MODULE
RANDOM FOREST ALGORITHM :
Random forest builds multiple decision trees and merges them together to
get a more accurate and stable prediction.
STEPS :
STEP 1 : Pick N random records from the dataset.
STEP 2 : Build a decision tree based on these N records.
STEP 3 : Choose the number of trees you want in your algorithm and repeat
steps 1 and 2.
STEP 4 : Each decision tree predict the output with the help of subset of data.
STEP 5 : Final output is based on the majority of output from the decision
tree.
22
WORKFLOW OF RANDOM FOREST
ALGORITHM
DATASET
MAJORITY VOTING
FINAL PREDICTION 23
MODULE II - DIABETIC
DIET PLAN
DIET PLAN AND STORING PATIENT RECORD IN EHR
24
DIET PLAN
• Based on the output of prediction process we are giving diet plan for
patient.
AGE AGE
ABO
8 AB O VE
8
1-1 VE 1-1 50
19 - 50
19 - 50
50
BELOW 18.5
BELOW 18.5
ABOVE 25
ABOVE 25
ABOVE 25
18 – 24
18 – 24
DIET DIET 18 – 24
DIET
PLAN PLAN PLAN
2 5 8
DIET DIET DIET DIET DIET DIET
PLAN PLAN PLAN PLAN PLAN PLAN
1 3 4 6 7 9
26
STORING PATIENT DETAILS IN
DATABASE
• We created database named flaskapp to store the patient details
given by the user this is considered as Electronic Health Record(EHR)
27
MODULE III -
MEDCHAIN
BLOCKCHAIN BASED ELECTRONIC HEALTH RECORD
28
INTERPLANETARY FILE
SYSTEM(IPFS)
• IPFS gives a unique hash value to each file
• The hash is totally different even if there's only a difference of one single
character
• IPFS is a file sharing system that can be leveraged to more efficiently
store and share large files
• It relies on cryptographic hashes that can easily be stored on a blockchain
29
INTERPLANETARY FILE
SYSTEM(IPFS)
• CONTENT ADDRESSING
IPFS uses content addressing to identify content by what's in it rather than by where it's
located. IPFS protocol has a content identifier, or CID, that is its hash. The hash is unique to the
content that it came from, even though it may look short compared to the original content. IPLD
translates between hash-linked data structures, allowing for the unification of the data across
distributed systems.
• STEP 3 : We can access the EHR in the IPFS gateway using the hash
value
32
WORKFLOW OF SECURING EHR
USING IPFS
DATA ARE BROKEN INTO
DISTRIBUTED LEDGER
BLOCKS
IPFS
UPLOAD DOWNLOAD
ENCRYPTED DATA HASH VALUE
TO IPFS
33
RESULT ANALYSIS
34
DATA SET DESCRIPTION
S.NO ATTRIBUTE DESCRIPTION OF ATTRIBUTES MEAN VALUES STD
VALUE
1 Pregnancies Number of times 0.2262 0.19
• Reading latency
• Writing latency
ACCURACY COMPARSION OF ML
ALGORITHMS Y
100
90
80
70
ACCURACY
60
50
40
30
20
10
0 X
KNN Navie Bayes Gradient Boosting Random Forest
ALGORITHMS
38
CONFUSION MATRIX OF RANDOM
FOREST ALGORITHM
PREDICTED CLASS
YES NO
ACTUAL CLASS
YES TP FN
NO FP TN
ACCURACY = TP+TN/TP+TN+FN+FP
39
INTERPLANETARY FILE SYSTEM
FILE SYSTEM PROTOCOL VERSION OPERATONS
40
WRITING LATENCY
Y
Y
IPFS and FTP writing small data IPFS and FTP writing large data
350 4000
3000
250
2500
200 2000
150 1500
1000
100
500
50
0
1mb 4mb 16mb 64mb X
0
1kb 4kb 16kb 64kb 256kb
X
IPFS FTP
IPFS FTP
FILE SIZE
FILE SIZE
41
READING LATENCY
Y
Y
IPFS and FTP reading small data IPFS and FTP reading large data
250 7000
6000
LATENCY TIME (ms)
150 4000
3000
100
2000
50 1000
0
1mb 4mb 16mb 64mb X
0 X
1kb 4kb 16kb 64kb 256kb
IPFS FTP
IPFS FTP
FILE SIZE FILE SIZE
42
CONCLUSION
• The primary application of this system is to predict whether a patient
is diabetic by using random forest algorithm with accuracy of 97%
when compare to other machine learning algorithms it also provide a
diet plan based on their age and BMI categories. Furthermore, patient
information is stored in an electronic health record and is secured
using the Interplanetary File System.
43
FUTURE WORK
REFERENCE
• Sridevi Krishnan, Erik R. Gertz, Sean H. Adams, John W. Newman,Theresa L.
Pedersen, Nancy L. Keim, Brian J. Bennett/Effects of a diet based on the Dietary
Guidelines on vascular health and TMAO in women with cardiometabolic risk
factors/Nutrition, Metabolism & Cardiovascular Diseases (2022) 32, 210e219
• Mahmood Safaei, Elankovan A. Sundararajan, Maha Driss, Wadii Boulila,
Azrulhizam Shapi’i/A systematic literature review on obesity: Understanding the
causes & consequences of obesity and reviewing various machine learning
approaches used to predict obesity/Computers in Biology and Medicine 136
(2021) 104754
• Lara Lama, Oskar Wilhelmsson, Erik Norlander, Lars Gustafsson, Anton Lager,Per
Tynelius, Lars Warvik, Claes-Goran Ostenson/Machine learning for prediction of
diabetes risk in middle-aged Swedish people/Heliyon 7 (2021) e07419
• Md. Mazharul Islam , Rittika Shamsuddin/ Machine learning to promote health
management through lifestyle changes for hypertension patients/Array 12 (2021)
100090
45
REFERENCE
• Akhilendra Pratap Singh, Member, IEEE, Nihar Ranjan Pradhan , Student Member,
IEEE, Ashish K. Luhach , Member, IEEE, Sivansu Agnihotri, Member, IEEE, Noor
Zaman Jhanjhi , Senior Member, IEEE, Sahil Verma , Member, IEEE, Kavita , Member,
IEEE, Uttam Ghosh , Senior Member, IEEE, and Diptendu Sinha Roy , Senior Member,
IEEE/A Novel Patient-Centric Architectural Framework for Blockchain-Enabled
Healthcare Applications/IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL.
17, NO. 8, AUGUST 2021
• ROSARIO CATELLI, FRANCESCO GARGIULO, VALENTINA CASOLA, GIUSEPPE DE
PIETRO, HAMIDO FUJITA, AND MASSIMO ESPOSITO/A Novel COVID-19 Data Set and
an Effective Deep Learning Approach for the De-Identification of Italian Medical
Records/Digital Object Identifier 10.1109/ACCESS.2021.3054479
• Jyotismita Chaki, S. Thillai Ganesh, S.K Cidham, S. Ananda Theertan/Machine
learning and artificial intelligence based Diabetes Mellitus detection and self-
management: A systematic review/J. Chaki et al. / Journal of King Saud University –
Computer and Information Sciences xxx (xxxx) xxx - 2020
46
THANK YOU
47