You are on page 1of 20

ANALYSIS OF CHRONIC

KIDNEY DISEASE
GUIDE : PROF. KUMAR DEBASIS

Team members :

V. Srinivas (18MIS7011)
G. Leela Sai Abhinav (18MIS7020)
C.V.S.R. Rohith (18MIS7262)
INTRODUCTION:

● Chronic Kidney Disease ( also known as Chronic Renal Disease ) is a kind of


kidney ailment , diagnosed with which a person’s kidneys will exhibit a step by
step failure in the course of months or years.
● This ailment is commonly seen amongst the population of the third world countries,
often causatives varying from hypertension to diabetes, and from high blood
pressure to inflammation.
● With this project, we wanted to find out if a person is potentially susceptible to be
affected by Chronic Kidney Disease with as high accuracy as we could possibly
predict, using machine learning so as to help further the fight against this deadly
disease that cost the lives of millions all over the world.
ABSTRACT :

•Prevention is better than cure

• Machine Learning helps in early diagnosis

• Aim is alerting potential victims through early recognition


● Usage of highly correlated attributes is preferred to have an edge over
existing models.
OBJECTIVES :

• Usage of Key and non redundant attributes


• Risk assessment for a person’s susceptibility to be a victim to KCD.
• Usage of apt machine learning algorithms to minimize wrongful diagnosis
PROPOSED MODEL :
METHODOLOGIES :
• STEP-1: Dataset Pre-processing
Outliners, Missing values, Data Reduction, Data Normalization
• STEP-2: Exploratory Data Analysis ( Visualizations)
• STEP-3: Applying Machine Learning Algorithms( Modeling)

●LOGISTIC REGRESSION

●ARTIFICIAL NEURAL NETWORKS

●XGBOOST

●ADABOOST

●XTREE

• Whichever modeling technique has the highest accuracy is more efficient and results are also
accurate.
WHY Logistic Regression ?

● we use Logistic Regression to convert the straight best fit line in linear regression to an S-curve using the
sigmoid function, which will always give values between 0 and 1.

● Logistic regression is considered here because we have two possible outcomes , yes or no when the
prediction of CKD is concerned.

● No assumptions about distributions of classes in feature space are made.


IMPLEMENTATION :

1) Logistic Regression :
WHY XGBoost ?

● In this algorithm, decision trees are created in sequential form.

● Weights play an important role here. Weights are assigned to all the independent
variables which are then fed into the decision tree which predicts results.

● The weight of variables predicted wrong by the tree is increased and these variables are
then fed to the second decision tree.

● These individual classifiers/predictors then ensemble to give a strong and more precise
model.
2) XGBOOST :
XGBOOST (Continuation) :
WHY ANN ?

● Artificial Neural Networks (ANN) have many different coefficients, which it can
optimize. Hence, it can handle much more variability as compared to traditional
models.

● With time, ANN will train itself for detecting all possible scenarios that contribute to a
person being affected with CKD.

● ANN does this by remembering every cell, so we can use this in scenarios where
we need to deal with a data that consumes a huge memory.
3) Artificial Neural Network (ANN) :
ANN (Continuation):
WHY ADABOOST ?
● Weak models are added sequentially, trained using the weighted training data.

● The process continues until a pre-set number of weak learners have been created (a user parameter) or no further
improvement can be made on the training dataset.

● Once completed, you are left with a pool of weak learners each with a stage value.

● Predictions are made by calculating the weighted average of the weak classifiers.

● For a new input instance, each weak learner calculates a predicted value as either +1.0 or -1.0.

● The predicted values are weighted by each weak learners stage value. The prediction for the ensemble model is taken as a
the sum of the weighted predictions.
4 ADABOOST :
WHY XTREE ?

● Extremely Randomized Trees Classifier(Extra Trees Classifier) is a type of ensemble learning


technique which aggregates the results of multiple de-correlated decision trees collected in a “forest” to
output it’s classification result.

● The Extra-Trees algorithm builds an ensemble of unpruned decision or regression trees according to the
classical top-down procedure.

● Its two main differences with other tree-based ensemble methods are that it splits nodes by choosing cut-
points fully at random and that it uses the whole learning sample (rather than a bootstrap replica) to grow
the trees.
5) XTREE :
COMPARATIVE ANALYSIS

TECHNIQUE Accuracy in our Accuracy in predictions


Prediction (in %) preceding our work (in %)

LOGISTIC REGRESSION 98.27 97.45

ANN 99.56 97.56

XGBOOST 96.55 96.25

XTREE 98.33 —NA—

ADABOOST 96.55 95.00


THANK YOU !

You might also like