Professional Documents
Culture Documents
Learning Techniques
PRESENTED BY:
Madhushree M
PROBLEM STATEMENT
Breast cancer is one of the most feared diseases in the world with high
fatality rate. It is the second most common cause of death after lung cancer.
There are several challenges in the health care sector in India like: inadequate
number of doctors and lack of standardized procedures/methods of disease
diagnosis.
Another major challenge is the huge amount of patients’ data generated
through various types of scans and need for automated, cost-effective and fast
processing for accurate diagnosis.
The final issue is human intervention in diagnosis which may lead to
inaccurate diagnosis and delay in treatment.
Early diagnosis is needed to provide proper treatment and reduce mortality
rate.
So, our aim is to create a standardized, customizable and affordable breast
cancer detection / classification system using Machine Learning Techniques. 2
METHODOLOGY
Classification
prediction
model
Benign Malignant
3
Dataset used: Wisconsin breast cancer dataset from UCI machine
learning repository.
It is the open source database available in the csv format.
The breast cancer includes 569 examples of cancer biopsies, each with 32
features.
After data pre-processing we get 15 unique features.
4
Evaluation metrics
The performance of the model is measured with respect to the below
metrics.
Confusion matrix: a confusion matrix is a technique for summarizing the
performance of a classification algorithm.
F1 score: It is the harmonic mean of precision and recall and gives a better
measure of the incorrectly classified cases.
We are considering the F1 score metric to evaluate the classification model,
since it takes both false positives and false negatives into account. 5
IMPLEMENTATION
• We have used 3 machine learning algorithms for breast cancer
detection/classification.
• Algorithms used: K-Nearest Neighbour, Support Vector Machine, Logistic
Regression.
MODEL 1: KNN
Parameters
CASE 1: CASE 2: Parameters
N_neighbors=5 N_neighbors=5
Metric = minkowski Metric = minkowski
P=1 P=2
7
Contd....
Model 2: Support Vector Machine
8
Contd....
Model 3 : Logistic Regression
C(concordance C(concordance
Statistic)=0.1 Statistic)=1
11
LOGISTIC REGRESSION
What is logistic regression?
Logistic regression is a statistical method of classification of objects.
We can solve binary classification problem using logistic regression
technique.
Logistic regression is a regression model where the dependent variable is
categorical such as.
A doctor classifies the tumour as malignant or benign.
A bank transaction may be fraudulent or genuine.
Every incoming mail is spam or not spam.
Logistic regression is just one part of machine learning used for solving this
kind of binary classification problem.
Contd...
Why only logistic regression?
• Because it produces results in a binary format which is used to predict the
Outcome of a categorical dependent variable. Let’s say for example
• Whether the given animal is cat or rat etc... So the outcome is
discrete/Categorical such as 0/1, yes/no, true/false.
Why can’t we use linear regression?
• Ex: salary v/s experience, tumour detection
Contd...
We use linear regression when our output value in range. But in this case it is
in discrete form i.e., Either 0 or 1.
h(x) is the predicted output, T is the slope, x is the data points and C is the
intercept.
To obtain the limits between 0 to 1, we need to transform the equation.
h(x) = g(T x) = g(z) 0 h(x) 1
Sigmoid function or logistic function
g(z) =
1 g(z)
Where, Z = x T
z
0
LOGISTIC REGRESSION DECISION BOUNDARY
Suppose predict “y=1” if h(x) 0.5 g(z) 0.5 when z 0
h(x)=g(T x) 0.5 Whenever T x 0
x2
Take
0 = -3,
1 =1,
2 =1