You are on page 1of 1

1 Problem Breast Cancer Classification

Benign, Mallign etc

Breast Cancer based on cell


2 Dataset structure

(683, 11)

dataset.shape
3 Summarize Dataset
dataset.head(5)

4 Segregating Dataset into X & Y

Splitting Dataset into Train &


5 Test

Difference between the average prediction


of our model and the actual value

Bias
Model with High bias leads to high error
on training and test data

Bias & Variance


variability of model prediction for a given
data point - Data Spread

Variance Model with high variance pays a lot of


attention to training data and does not for
new data

Bagging Only controls for high variance in a model

Boosting algorithms play a crucial role in


dealing both bias & variance

Boosting is a sequential technique which


Overview works on the principle of ensemble

It combines a set of weak learners and


delivers improved prediction accuracy

Boosting
At any instant t, the model outcomes are
weighed based on the outcomes of
previous instant t-1

Process The outcomes predicted correctly are


given a lower weight and the ones miss-
classified are weighted higher

Tree-Specific Parameters It affects each individual tree

Types of Parameters Boosting Parameters It affects the boosting operation

Miscellaneous Parameters It affects overall functioning

1 Initialize the outcome

Gradient Boosting Machine Parameter


Update the weights for targets based on
previous run
Breast cancer Tumor
prediction using Fit the model on selected subsample of
data
XGBOOST 6 Steps of GBM
Algorithm 2 Iterate from 1 to total number of trees
Make predictions on the full set of
observations

Update the output with current results


taking into account the learning rate

3 Return the final output

Ensemble Learning

It also combines the results of many


1 models

Like Random Forests, It uses Decision


2 Trees as base learners

Individual decision trees are low-bias, high-


3 variance models

Tress used by XGBoost is different - instead After the tree reaches max depth, the
of containing a single decision in each decision can be made by converting the
Overview “leaf” node, they contain real-value scores scores into categories using a certain
4 of whether an instance belongs to a group threshold

It has Regularization, whereas GBM


5 implementation has no regularization Reduces Overfitting

6 It implements Parallel Processing

GBM would stop splitting a node when it


encounters a negative loss in the split.
Thus it is more of a greedy algorithm
XGBOOST - eXtreme Gradient Boosting
XGBoost on the other hand make splits
7 Tree Pruning upto the max_depth specified and then
start pruning the tree backwards and
remove splits beyond which there is no
positive gain

8 Built-in Cross-Validation

Booster Parameters It affects the boosting operation

Parameters Learning Task Parameters It guides optimized performance

General Parameters It affects each individual tree

7 Training with XGBOOST

8 Confusion Matrix

It is a procedure used to estimate the skill


of the model on new data
9 K-Fold Cross Validation
k that refers to the number of groups that
a given data sample

You might also like