You are on page 1of 93

AUTOMATED MEDICAL DIAGNOSIS

Ahmed Ezzat
Supervisors:
Prof. Magda B. Fayek
Assoc Prof. Mona Farouk

Cairo University
Ahmed.e.mohamed@eng1.cu.edu.eg

May 15, 2022

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 1 / 48
Overview

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 2 / 48
Plan

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 3 / 48
Problem Definition

GDD
To develop a machine learning algorithm to diagnose diseases by
examining the bio-medical features.

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 4 / 48
Problem Definition

GDD
To develop a machine learning algorithm to diagnose diseases by
examining the bio-medical features.

Slide-Detect
To develop a Deep learning algorithm to diagnose lung infiltration by
examining the chest X-ray scans.

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 4 / 48
Plan

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 5 / 48
Motivations

Motivation 1
Diabetes, cervical cancer and lung infiltration are leading cause of
deaths.

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 6 / 48
Motivations

Motivation 1
Diabetes, cervical cancer and lung infiltration are leading cause of
deaths.

Motivation 2
Unlike physicians, Computer Aided Diagnoses (CAD) can process
large number of cases efficiently.

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 6 / 48
Motivations

Motivation 1
Diabetes, cervical cancer and lung infiltration are leading cause of
deaths.

Motivation 2
Unlike physicians, Computer Aided Diagnoses (CAD) can process
large number of cases efficiently.

Motivation 3
The number of cases is increasing rapidly specially in the developing
countries.

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 6 / 48
Plan

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 7 / 48
Research Questions

Question 1
What are the most important features in diagnosing diabetes and
cervical cancer?

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 8 / 48
Research Questions

Question 1
What are the most important features in diagnosing diabetes and
cervical cancer?

Question 2
How are the most important features of diabetes and cervical cancer
distributed in the hyper-space?

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 8 / 48
Research Questions

Question 1
What are the most important features in diagnosing diabetes and
cervical cancer?

Question 2
How are the most important features of diabetes and cervical cancer
distributed in the hyper-space?

Question 3
How to increase the CAD diagnosing accuracy in diabetes, cervical
cancer and lung infiltration?

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 8 / 48
Plan

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 9 / 48
Previous Work

Figure 1: A summary of the previous work


Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona
AMDFarouk (CUFE) May 15, 2022 10 / 48
Diabetes pipelines

Figure 2: Diabetes pipelines

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 11 / 48
Cervical cancer pipelines

Figure 3: Cervical Cancer pipelines


Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona
AMDFarouk (CUFE) May 15, 2022 12 / 48
Previous Work

Table 1: Comparison of the previous work techniques


Technique year Dataset Accuracy
AlexNet 2019 chest 60.40%
VGG-16 2019 chest 60.87%
GoogLeNet 2019 chest 58.95%
ReserNet-50 2019 chest 61.27%
Feature integration 2019 chest 70.3%
Lung-heart segmentation 2019 chest 70.9%

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 13 / 48
Previous Work

Table 2: Comparison of the previous work techniques


Technique year Dataset Accuracy
MLP 2018 diabetes 88.5%
NBC 2017 diabetes 31.43%
RF 2017 diabetes 79.19%
KNN 2017 diabetes 78.9%

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 14 / 48
Previous Work

Table 3: Comparison of the previous work techniques


Technique year Dataset Accuracy
NBC 2017 cervical cancer 80%
MSO 2017 cervical cancer 80%
RF 2017 cervical cancer 80%
SVM-PCA 2017 cervical cancer 92%

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 15 / 48
Plan

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 16 / 48
Datasets

Diabetes Datasets
Name: Diabetes 130- US hospitals for years 1999-2008

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 17 / 48
Datasets

Diabetes Datasets
Name: Diabetes 130- US hospitals for years 1999-2008
Source: UCI machine learning repository

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 17 / 48
Datasets

Diabetes Datasets
Name: Diabetes 130- US hospitals for years 1999-2008
Source: UCI machine learning repository
Records: 100,000

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 17 / 48
Datasets

Diabetes Datasets
Name: Diabetes 130- US hospitals for years 1999-2008
Source: UCI machine learning repository
Records: 100,000
Fields: 55 attributes

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 17 / 48
Datasets

Diabetes Datasets
Name: Diabetes 130- US hospitals for years 1999-2008
Source: UCI machine learning repository
Records: 100,000
Fields: 55 attributes

Cervical Cancer Dataset


Name: Cervical Cancer Wisconsin (Diagnostic)

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 17 / 48
Datasets

Diabetes Datasets
Name: Diabetes 130- US hospitals for years 1999-2008
Source: UCI machine learning repository
Records: 100,000
Fields: 55 attributes

Cervical Cancer Dataset


Name: Cervical Cancer Wisconsin (Diagnostic)
Source: UCI machine learning repository

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 17 / 48
Datasets

Diabetes Datasets
Name: Diabetes 130- US hospitals for years 1999-2008
Source: UCI machine learning repository
Records: 100,000
Fields: 55 attributes

Cervical Cancer Dataset


Name: Cervical Cancer Wisconsin (Diagnostic)
Source: UCI machine learning repository
Records: 569

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 17 / 48
Datasets

Diabetes Datasets
Name: Diabetes 130- US hospitals for years 1999-2008
Source: UCI machine learning repository
Records: 100,000
Fields: 55 attributes

Cervical Cancer Dataset


Name: Cervical Cancer Wisconsin (Diagnostic)
Source: UCI machine learning repository
Records: 569
Fields: 32 attributes

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 17 / 48
Datasets

Dataset
Name: ChestXray-NIHCC

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 18 / 48
Datasets

Dataset
Name: ChestXray-NIHCC
Source: NIH clinical center

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 18 / 48
Datasets

Dataset
Name: ChestXray-NIHCC
Source: NIH clinical center
Records: 112120

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 18 / 48
Datasets

Dataset
Name: ChestXray-NIHCC
Source: NIH clinical center
Records: 112120
Fields: numerical, categorical, images attributes

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 18 / 48
Plan

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 19 / 48
GDD Pre-processing

GDD Pre-processing
Removing the obviously irrelevant features for the classification
process such as: Patient ID, Hospital name, Room number etc...

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 20 / 48
GDD Pre-processing

GDD Pre-processing
Removing the obviously irrelevant features for the classification
process such as: Patient ID, Hospital name, Room number etc...
Converting categorial features into numeric features (Diabetes only)

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 20 / 48
GDD Pre-processing

GDD Pre-processing
Removing the obviously irrelevant features for the classification
process such as: Patient ID, Hospital name, Room number etc...
Converting categorial features into numeric features (Diabetes only)
Removing features which contain missing values above 50% (Diabetes
only)

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 20 / 48
GDD Pre-processing

GDD Pre-processing
Removing the obviously irrelevant features for the classification
process such as: Patient ID, Hospital name, Room number etc...
Converting categorial features into numeric features (Diabetes only)
Removing features which contain missing values above 50% (Diabetes
only)
Removing records with most of the fields missing (Diabetes only)

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 20 / 48
GDD Pre-processing

GDD Pre-processing
Removing the obviously irrelevant features for the classification
process such as: Patient ID, Hospital name, Room number etc...
Converting categorial features into numeric features (Diabetes only)
Removing features which contain missing values above 50% (Diabetes
only)
Removing records with most of the fields missing (Diabetes only)
Filling the missing values with the mean value (Diabetes only)

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 20 / 48
GDD Pre-processing

GDD Pre-processing
Removing the obviously irrelevant features for the classification
process such as: Patient ID, Hospital name, Room number etc...
Converting categorial features into numeric features (Diabetes only)
Removing features which contain missing values above 50% (Diabetes
only)
Removing records with most of the fields missing (Diabetes only)
Filling the missing values with the mean value (Diabetes only)
Removing features with low variance

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 20 / 48
GDD Pre-processing

GDD Pre-processing
Removing the obviously irrelevant features for the classification
process such as: Patient ID, Hospital name, Room number etc...
Converting categorial features into numeric features (Diabetes only)
Removing features which contain missing values above 50% (Diabetes
only)
Removing records with most of the fields missing (Diabetes only)
Filling the missing values with the mean value (Diabetes only)
Removing features with low variance
Normalizing attributes (features)

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 20 / 48
GDD Feature selection

GDD Feature selection


Identify the most important features using recursive feature
elimination (extra tree classifier backend)

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 21 / 48
GDD Feature selection

GDD Feature selection


Identify the most important features using recursive feature
elimination (extra tree classifier backend)
Apply PCA of 5 components to reduce the dataset dimensionality
(Diabetes only)

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 21 / 48
GDD Classification

Training procedure
Separate the dataset into two subsets: one containing only the
positive class and the other the negative class

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 22 / 48
GDD Classification

Training procedure
Separate the dataset into two subsets: one containing only the
positive class and the other the negative class
Split each subset into training and testing sets 70% and 30%
respectively

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 22 / 48
GDD Classification

Training procedure
Separate the dataset into two subsets: one containing only the
positive class and the other the negative class
Split each subset into training and testing sets 70% and 30%
respectively
Cluster the training datasets using k-means clustering algorithm into
k clusters where k ∈ [1 : 20]
(where k can be different for positive and negative subsets)

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 22 / 48
GDD Classification

Training procedure
Separate the dataset into two subsets: one containing only the
positive class and the other the negative class
Split each subset into training and testing sets 70% and 30%
respectively
Cluster the training datasets using k-means clustering algorithm into
k clusters where k ∈ [1 : 20]
(where k can be different for positive and negative subsets)
Save the obtained centers

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 22 / 48
GDD Classification

Training procedure
Separate the dataset into two subsets: one containing only the
positive class and the other the negative class
Split each subset into training and testing sets 70% and 30%
respectively
Cluster the training datasets using k-means clustering algorithm into
k clusters where k ∈ [1 : 20]
(where k can be different for positive and negative subsets)
Save the obtained centers
Test the performance of the selected combination of centers using the
test set

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 22 / 48
GDD Classification

Testing procedure
For every point in the test set, find the nearest centre to the test case

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 23 / 48
GDD Classification

Testing procedure
For every point in the test set, find the nearest centre to the test case
Calculate the classification score as illustrated in the next frame

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 23 / 48
Performance Metrics

Classification score = Sensitivity + Specificity +


Positive Predictive Accuracy + Negative Predictive Accuracy

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 24 / 48
Performance Metrics

Classification score = Sensitivity + Specificity +


Positive Predictive Accuracy + Negative Predictive Accuracy
TP+TN
Total accuracy = TP+TN+FP+FN

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 24 / 48
Performance Metrics

Classification score = Sensitivity + Specificity +


Positive Predictive Accuracy + Negative Predictive Accuracy
TP+TN
Total accuracy = TP+TN+FP+FN
TP
Sensitivity = TP+FN

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 24 / 48
Performance Metrics

Classification score = Sensitivity + Specificity +


Positive Predictive Accuracy + Negative Predictive Accuracy
TP+TN
Total accuracy = TP+TN+FP+FN
TP
Sensitivity = TP+FN
TN
Specificity = TN+FP

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 24 / 48
Performance Metrics

Classification score = Sensitivity + Specificity +


Positive Predictive Accuracy + Negative Predictive Accuracy
TP+TN
Total accuracy = TP+TN+FP+FN
TP
Sensitivity = TP+FN
TN
Specificity = TN+FP
TP
Positive Predictive Accuracy = TP+FP

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 24 / 48
Performance Metrics

Classification score = Sensitivity + Specificity +


Positive Predictive Accuracy + Negative Predictive Accuracy
TP+TN
Total accuracy = TP+TN+FP+FN
TP
Sensitivity = TP+FN
TN
Specificity = TN+FP
TP
Positive Predictive Accuracy = TP+FP
TN
Negative Predictive Accuracy = TN+FN

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 24 / 48
Slide-Detect Pre-processing

Slide-Detect Pre-processing
Separate the dataset classes into sample and control subsets

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 25 / 48
Slide-Detect Pre-processing

Slide-Detect Pre-processing
Separate the dataset classes into sample and control subsets
Normalize the subsets images

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 25 / 48
Slide-Detect Pre-processing

Slide-Detect Pre-processing
Separate the dataset classes into sample and control subsets
Normalize the subsets images
Perform a series of rotations, transitions, rescales, flips and zoom
operations to both the datasets

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 25 / 48
Slide-Detect Pre-processing

Slide-Detect Pre-processing
Separate the dataset classes into sample and control subsets
Normalize the subsets images
Perform a series of rotations, transitions, rescales, flips and zoom
operations to both the datasets
Save the resulting images to their corresponding subsets

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 25 / 48
Slide-Detect Feature Extraction

Slide-Detect Feature Extraction


Separate the dataset classes into sample and control subsets

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 26 / 48
Slide-Detect Feature Extraction

Slide-Detect Feature Extraction


Separate the dataset classes into sample and control subsets
Normalize the subsets images

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 26 / 48
Slide-Detect Feature Extraction

Slide-Detect Feature Extraction


Separate the dataset classes into sample and control subsets
Normalize the subsets images
Perform a series of rotations, transitions, rescales, flips and zoom
operations to both the datasets

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 26 / 48
Slide-Detect Feature Extraction

Slide-Detect Feature Extraction


Separate the dataset classes into sample and control subsets
Normalize the subsets images
Perform a series of rotations, transitions, rescales, flips and zoom
operations to both the datasets
Save the resulting images to their corresponding subsets

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 26 / 48
Slide-Detect Feature Extraction

Slide-Detect Feature Extraction


Separate the dataset classes into sample and control subsets
Normalize the subsets images
Perform a series of rotations, transitions, rescales, flips and zoom
operations to both the datasets
Save the resulting images to their corresponding subsets
Crop images with positive infiltration labels around the infection as
the positive set

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 26 / 48
Slide-Detect Feature Extraction

Slide-Detect Feature Extraction


Separate the dataset classes into sample and control subsets
Normalize the subsets images
Perform a series of rotations, transitions, rescales, flips and zoom
operations to both the datasets
Save the resulting images to their corresponding subsets
Crop images with positive infiltration labels around the infection as
the positive set
Crop images with healthy randomly as the negative set

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 26 / 48
Figure
Slide-Detect Classification procedure
Resize the input to 128X 128X 3 8-bit PNG image

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 27 / 48
Figure
Slide-Detect Classification procedure
Resize the input to 128X 128X 3 8-bit PNG image
Use the training set to train a 5-layer DNN shown blew

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 27 / 48
Slide-Detect Classification Metrics

Slide-Detect Classification procedure Metrics


Load and normalize all the healthy and sample images

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 28 / 48
Slide-Detect Classification Metrics

Slide-Detect Classification procedure Metrics


Load and normalize all the healthy and sample images
Scan all the images progressively in 128X 128 manner

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 28 / 48
Slide-Detect Classification Metrics

Slide-Detect Classification procedure Metrics


Load and normalize all the healthy and sample images
Scan all the images progressively in 128X 128 manner
Feed the portions to the trained DNN

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 28 / 48
Slide-Detect Classification Metrics

Slide-Detect Classification procedure Metrics


Load and normalize all the healthy and sample images
Scan all the images progressively in 128X 128 manner
Feed the portions to the trained DNN
If 10 portions in a single image was classified positive, predict positive

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 28 / 48
Slide-Detect Classification Metrics

Slide-Detect Classification procedure Metrics


Load and normalize all the healthy and sample images
Scan all the images progressively in 128X 128 manner
Feed the portions to the trained DNN
If 10 portions in a single image was classified positive, predict positive
Calculate the classification accuracy

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 28 / 48
Plan

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 29 / 48
Diabetes Most Important features

Table 4: Feature Importance for the Diabetes dataset


Feature Importance factor
tolazamide Up 0.299
A1Cresult None 0.100
1ange 0.090
insulin Down 0.065
acetohexamide Steady 0.059

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 30 / 48
cervical Cancer Most Important features

Table 5: Feature Importance for the cervical Cancer dataset


Feature Importance factor
radius mean 0.287
area worst 0.266
area mean 0.209
perimeter se 0.119
texture worst 0.116

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 31 / 48
Diabetes dataset classification results

Table 6: Diabetes dataset classification results comparison

Algorithm GDD Singh-Prasad Singh-Halgamuge


Score 3.994 not available not available
Accuracy 0.999 0.885 0.7919
Sensitivity 1.0 not available not available
Specificity 0.999 not available not available
Positive Predictive Accuracy 1.0 not available not available
Negative Predictive Accuracy 0.994 not available not available

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 32 / 48
cervical Cancer dataset classification results

Table 7: Cervical Cancer dataset classification results comparison


Algorithm GDD Ferreira-Dutra Kharya-Soni
Score 3.819 not available not available
Accuracy 0.958 0.81 0.920
Sensitivity 0.924 not available not available
Specificity 0.977 not available not available
Positive Predictive Accuracy 0.960 0.60 not available
Negative Predictive Accuracy 0.956 0.89 not available

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 33 / 48
Slide-Detect Results

Table 8: Slide-Detect Results


Technique Accuracy
AlexNet 60.40%
GoogLeNet 60.87%
VGGNet-16 58.95%
ResrNet-50 61.27%
Dense networks with relative location awareness 70.9%
Multiple feature integration 70.3%
Slide-Detect 93.33%

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 34 / 48
Plan

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 35 / 48
Diabetes Discussion

Figure 4: Diabetes dataset after dimension reduction in 2D

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 36 / 48
Diabetes Discussion

Figure 5: Diabetes dataset after dimension reduction in 3D

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 37 / 48
Diabetes Score Progression

Figure 6: Score Progress in Diabetes dataset with changing the number of


positive and negative classes clusters

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 38 / 48
cervical Cancer Discussion

Figure 7: cervical Cancer dataset after dimension reduction in 2D

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 39 / 48
cervical Cancer Discussion

Figure 8: cervical Cancer dataset after dimension reduction in 3D

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 40 / 48
cervical Score Progression

Figure 9: Score Progress in cervical Cancer dataset with changing the number of
positive and negative classes clusters

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 41 / 48
Class distribution in the Xchest dataset

Figure 10: Class distribution in the Xchest dataset

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 42 / 48
Age distribution in the Xchest dataset

Figure 11: Age distribution among the Infiltration patients in the Xchest dataset

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 43 / 48
Plan

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 44 / 48
Conclusion

Conclusion
This work proposes 2 algorithms for computer aided disease of
(diabetes, cervical cancer, lung infiltration)
Both algorithms outperformed the state of the art on the same
datasets achieving accuracies of .0999, 0.958 for GDD and 0.9333 for
slide-detect

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 45 / 48
Plan

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 46 / 48
Future Work

Future Work
The GDD algorithm makes an exhaustive search for the optimum
number of clusters which is computationally expensive. A binary
search algorithm may achieve the same results in a log time
The slide detect algorithm can be extended to support 3D scans

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 47 / 48
The End

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona


AMDFarouk (CUFE) May 15, 2022 48 / 48

You might also like