You are on page 1of 20

Project Title-: chronic kidney disease

Guide -: Dr J Manikandan

Team members-: 1. Soumya Mehta 20MIM10025

2.Kushagra Bande 20MIM10049

3.Riya Deohare 20MIM10059

4.Rohan Vinay Chaudhary 20MIM10081

5.Priyanshu Sharma 20MIM10104

 Introduction-:chronic kidney disease (CKD) or chronic renal disease has become a major issue
with a steady growth rate.
chronic kidney disease (CKD) is a condition resulting in insufficient kidney function, where patients
have to live with a compromised quality of life.

CKD is a substantial financial burden on patients, healthcare services, and the government.

Machine Learning Algorithms are used to detect and predict diseases with more accuracy.

With the availability of biomedical data, the use of machine-learning techniques in healthcare for
developing disease prediction models has become common. Further, methods such as deep learning
and techniques like ensemble learning have greatly improved the predictive power of machine
learning models.

There are some classifiers that don’t stand fit to the data set in the context.

Some of the machine learning approaches that are being considered, do not stand viable for a large
volume of data.

Certain methodologies are incompatible and non-cohesive when it comes to the collection of real
time data and the implementation procedures of the same.

It is important to have effective methods for early prediction of CKD.


Existing Work with Limitations-: Fuzzy Logic has been developed for the classification of patients
with CKD.

Data mining techniques are used to come to one particular conclusion that pertains to the
characteristics of patients of all kinds who have liver diseases.
Proposed work -: The performance Classification of disease is further improved.

Time Complexity and accuracy can be measured.

Risk factors can be predicted early.

Different machine learning has high accuracy of the result.

Methodology-:

This technology can achieve accurate


and economical
diagnoses of diseases; hence, it
might be a promising method
for diagnosing CKD.
Data Extraction (to gather dataset)

Data Pre-processing(missing value handling and feature selection)

Format and reduce data

Complete data cleaning

Data analysis

Data exploration

Model Training

Prediction using pre-process data

Taking input from user and predicting person health(form)


Data Pre-processing

Hence, we speculated that this


methodology could be applicable to
more complicated clinical data for
disease diagnosis
Hence, we speculated that this
methodology could be applicable to
more complicated clinical data for
disease diagnosis
Novelty of Project-: EDA-: For Meaning deep data representation in form graphs and charts.

Model Optimisation -: the model will have optimised solution missing in present
world.

Accuracy-: Accuracy level of prediction will be improved.

Real Time Usage -:

Hardware & software requirements-:Jupyter ,spyder, Vscode .

Library used-:numy, matplotlib, plotly, seaborn ,pandas , x


 Overall system architecture diagram-: data collection -analysis, data pre-processing ,final
dataset ,test and train, accuracy, prediction

Conclusion
The main advantage of the existing work is in the use of comorbidities for prediction of RRT. A large
heterogeneous population should be used to create and evaluate the model’s performance before it
can be applied to clinical practice. It must be understood that by using ML algorithms, our study
provides a screening approach for predicting the chances of upcoming RRT based on the clinical
data, therefore this should neither be considered as clinical guideline nor a diagnostic / therapeutic
tool for CKD patients. On the other hand, the results at this point are more interesting from the point of
view of policy-makers, such as hospital managers or health officials, or insurance companies. Using
predictive models on a general population with the data available can allow for better planning and
allocation of resources. Future scope lies in coming up with a prediction model that would factor in the
more clinical data (use of specific drugs / associated comorbidities / dietary interventions / degree of
blood pressure control / degree of blood sugar control) in predicting the outcomes and providing a
possible chance for us to tailor the therapeutic interventions accordingly.
We need to realise that the ML algorithms our study provides need to be considered as a possible
screening tool to predict the time frame of progression of CKD patient before he/she would need RRT.

This project is a medical sector application which helps the medical practitioners in
predicting the CKD disease based on the CKD parameters. It is automation for CKD
disease prediction and it identifies the disease, its stages in an efficient and
economically manner. It is successfully accomplished by applying the KNN and Naive
Bayes algorithms for classification. This classification technique comes under data
mining technology. This algorithm takes CKD parameters as input and predicts the
disease based on old CKD patient’s data.

This work is licensed under a Creative


Commons Attribution 4.0 License. For more
information, see
https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication
in a future issue of this journal, but has not
been fully edited. Content may change prior to
final publication. Citation information: DOI
10.1109/ACCESS.2019.2963053, IEEE
Access
Author et al.: Preparation of Papers for IEEE
TRANSACTIONS and JOURNALS
samples, such as general similarity
coefficient [37]. In this
study, we used euclidean distance to
evaluate the similarity
between samples, and KNN could
obtain a good result based
on euclidean distance with the
highest accuracy of 99.25%.
Therefore, we did not use other
methods to evaluate the
similarity between samples.
V. CONCLUSION
The proposed CKD diagnostic
methodology is feasible in
terms of data imputation and samples
diagnosis. After un-
supervised imputation of missing
values in the data set by
using KNN imputation, the
integrated model could achieve
a satisfactory accuracy. Hence, we
speculate that applying
this methodology to the practical
diagnosis of CKD would
achieve a desirable effect. In
addition, this methodology
might be applicable to the clinical
data of the other diseases
in actual medical diagnosis.
However, in the process of es-
tablishing the model, due to the
limitations of the conditions,
the available data samples are
relatively small, including only
400 samples. Therefore, the
generalization performance of
the model might be limited. In
addition, due to there are only
two categories (ckd and notckd) of
data samples in the data
set, the model can not diagnose the
severity of CKD. In the
future, a large number of more
complex and representative
data will be collected to train the
model to improve the
generalization performance while
enabling it to detect the
severity of the disease. We believe
that this model will be
more and more perfect by the
increase of size and quality of
the dat
This work is licensed under a Creative
Commons Attribution 4.0 License. For more
information, see
https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication
in a future issue of this journal, but has not
been fully edited. Content may change prior to
final publication. Citation information: DOI
10.1109/ACCESS.2019.2963053, IEEE
Access
Author et al.: Preparation of Papers for IEEE
TRANSACTIONS and JOURNALS
samples, such as general similarity
coefficient [37]. In this
study, we used euclidean distance to
evaluate the similarity
between samples, and KNN could
obtain a good result based
on euclidean distance with the
highest accuracy of 99.25%.
Therefore, we did not use other
methods to evaluate the
similarity between samples.
V. CONCLUSION
The proposed CKD diagnostic
methodology is feasible in
terms of data imputation and samples
diagnosis. After un-
supervised imputation of missing
values in the data set by
using KNN imputation, the
integrated model could achieve
a satisfactory accuracy. Hence, we
speculate that applying
this methodology to the practical
diagnosis of CKD would
achieve a desirable effect. In
addition, this methodology
might be applicable to the clinical
data of the other diseases
in actual medical diagnosis.
However, in the process of es-
tablishing the model, due to the
limitations of the conditions,
the available data samples are
relatively small, including only
400 samples. Therefore, the
generalization performance of
the model might be limited. In
addition, due to there are only
two categories (ckd and notckd) of
data samples in the data
set, the model can not diagnose the
severity of CKD. In the
future, a large number of more
complex and representative
data will be collected to train the
model to improve the
generalization performance while
enabling it to detect the
severity of the disease. We believe
that this model will be
more and more perfect by the
increase of size and quality of
the dat
This work is licensed under a Creative
Commons Attribution 4.0 License. For more
information, see
https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication
in a future issue of this journal, but has not
been fully edited. Content may change prior to
final publication. Citation information: DOI
10.1109/ACCESS.2019.2963053, IEEE
Access
Author et al.: Preparation of Papers for IEEE
TRANSACTIONS and JOURNALS
samples, such as general similarity
coefficient [37]. In this
study, we used euclidean distance to
evaluate the similarity
between samples, and KNN could
obtain a good result based
on euclidean distance with the
highest accuracy of 99.25%.
Therefore, we did not use other
methods to evaluate the
similarity between samples.
V. CONCLUSION
The proposed CKD diagnostic
methodology is feasible in
terms of data imputation and samples
diagnosis. After un-
supervised imputation of missing
values in the data set by
using KNN imputation, the
integrated model could achieve
a satisfactory accuracy. Hence, we
speculate that applying
this methodology to the practical
diagnosis of CKD would
achieve a desirable effect. In
addition, this methodology
might be applicable to the clinical
data of the other diseases
in actual medical diagnosis.
However, in the process of es-
tablishing the model, due to the
limitations of the conditions,
the available data samples are
relatively small, including only
400 samples. Therefore, the
generalization performance of
the model might be limited. In
addition, due to there are only
two categories (ckd and notckd) of
data samples in the data
set, the model can not diagnose the
severity of CKD. In the
future, a large number of more
complex and representative
data will be collected to train the
model to improve the
generalization performance while
enabling it to detect the
severity of the disease. We believe
that this model will be
more and more perfect by the
increase of size and quality of
the data
This work is licensed under a Creative
Commons Attribution 4.0 License. For more
information, see
https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication
in a future issue of this journal, but has not
been fully edited. Content may change prior to
final publication. Citation information: DOI
10.1109/ACCESS.2019.2963053, IEEE
Access
Author et al.: Preparation of Papers for IEEE
TRANSACTIONS and JOURNALS
samples, such as general similarity
coefficient [37]. In this
study, we used euclidean distance to
evaluate the similarity
between samples, and KNN could
obtain a good result based
on euclidean distance with the
highest accuracy of 99.25%.
Therefore, we did not use other
methods to evaluate the
similarity between samples.
V. CONCLUSION
The proposed CKD diagnostic
methodology is feasible in
terms of data imputation and samples
diagnosis. After un-
supervised imputation of missing
values in the data set by
using KNN imputation, the
integrated model could achieve
a satisfactory accuracy. Hence, we
speculate that applying
this methodology to the practical
diagnosis of CKD would
achieve a desirable effect. In
addition, this methodology
might be applicable to the clinical
data of the other diseases
in actual medical diagnosis.
However, in the process of es-
tablishing the model, due to the
limitations of the conditions,
the available data samples are
relatively small, including only
400 samples. Therefore, the
generalization performance of
the model might be limited. In
addition, due to there are only
two categories (ckd and notckd) of
data samples in the data
set, the model can not diagnose the
severity of CKD. In the
future, a large number of more
complex and representative
data will be collected to train the
model to improve the
generalization performance while
enabling it to detect the
severity of the disease. We believe
that this model will be
more and more perfect by the
increase of size and quality of
the data

You might also like