Automated Medical Diagnosis: Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona Farouk

AUTOMATED MEDICAL DIAGNOSIS
Ahmed Ezzat
Supervisors:
Prof. Magda B. Fayek
Assoc Prof. Mona Farouk
Cairo University
Ahmed.e.mohamed@eng1.cu.edu.eg
May 15, 2022
Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

AMDFarouk (CUFE) May 15, 2022 1 / 48
Overview

Plan

Problem Definition
GDD
To develop a machine learning algorithm to diagnose diseases by
examining the bio-medical features.

Problem Definition
GDD
To develop a machine learning algorithm to diagnose diseases by
examining the bio-medical features.
Slide-Detect
To develop a Deep learning algorithm to diagnose lung infiltration by
examining the chest X-ray scans.

Plan

Motivations
Motivation 1
Diabetes, cervical cancer and lung infiltration are leading cause of
deaths.

Motivations
Motivation 1
deaths.
Motivation 2
Unlike physicians, Computer Aided Diagnoses (CAD) can process
large number of cases efficiently.

Motivations
Motivation 1
deaths.
Motivation 2
Unlike physicians, Computer Aided Diagnoses (CAD) can process
large number of cases efficiently.
Motivation 3
The number of cases is increasing rapidly specially in the developing
countries.

Plan

Research Questions
Question 1
What are the most important features in diagnosing diabetes and
cervical cancer?

Research Questions
Question 1
cervical cancer?
Question 2
How are the most important features of diabetes and cervical cancer
distributed in the hyper-space?

Research Questions
Question 1
cervical cancer?
Question 2
How are the most important features of diabetes and cervical cancer
distributed in the hyper-space?
Question 3
How to increase the CAD diagnosing accuracy in diabetes, cervical
cancer and lung infiltration?

Plan

Previous Work
Figure 1: A summary of the previous work

Diabetes pipelines
Figure 2: Diabetes pipelines

Cervical cancer pipelines
Figure 3: Cervical Cancer pipelines

Previous Work
Table 1: Comparison of the previous work techniques

Technique year Dataset Accuracy
AlexNet 2019 chest 60.40%
VGG-16 2019 chest 60.87%
GoogLeNet 2019 chest 58.95%
ReserNet-50 2019 chest 61.27%
Feature integration 2019 chest 70.3%
Lung-heart segmentation 2019 chest 70.9%

Previous Work

MLP 2018 diabetes 88.5%
NBC 2017 diabetes 31.43%
RF 2017 diabetes 79.19%
KNN 2017 diabetes 78.9%

Previous Work

NBC 2017 cervical cancer 80%
MSO 2017 cervical cancer 80%
RF 2017 cervical cancer 80%
SVM-PCA 2017 cervical cancer 92%

Plan

Datasets
Diabetes Datasets
Name: Diabetes 130- US hospitals for years 1999-2008

Datasets
Diabetes Datasets
Source: UCI machine learning repository

Datasets
Diabetes Datasets
Records: 100,000

Datasets
Diabetes Datasets
Records: 100,000
Fields: 55 attributes

Datasets
Diabetes Datasets
Records: 100,000
Cervical Cancer Dataset

Name: Cervical Cancer Wisconsin (Diagnostic)

Datasets
Diabetes Datasets
Records: 100,000


Datasets
Diabetes Datasets
Records: 100,000

Records: 569

Datasets
Diabetes Datasets
Records: 100,000

Records: 569

Datasets
Dataset
Name: ChestXray-NIHCC

Datasets
Dataset
Source: NIH clinical center

Datasets
Dataset
Records: 112120

Datasets
Dataset
Records: 112120
Fields: numerical, categorical, images attributes

Plan

GDD Pre-processing
GDD Pre-processing
Removing the obviously irrelevant features for the classification
process such as: Patient ID, Hospital name, Room number etc...

GDD Pre-processing
GDD Pre-processing
Converting categorial features into numeric features (Diabetes only)

GDD Pre-processing
GDD Pre-processing
Removing features which contain missing values above 50% (Diabetes
only)

GDD Pre-processing
GDD Pre-processing
only)
Removing records with most of the fields missing (Diabetes only)

GDD Pre-processing
GDD Pre-processing
only)
Filling the missing values with the mean value (Diabetes only)

GDD Pre-processing
GDD Pre-processing
only)
Removing features with low variance

GDD Pre-processing
GDD Pre-processing
only)
Removing features with low variance
Normalizing attributes (features)

GDD Feature selection

Identify the most important features using recursive feature
elimination (extra tree classifier backend)


Identify the most important features using recursive feature
elimination (extra tree classifier backend)
Apply PCA of 5 components to reduce the dataset dimensionality
(Diabetes only)

GDD Classification
Training procedure
Separate the dataset into two subsets: one containing only the
positive class and the other the negative class

GDD Classification
Training procedure
Split each subset into training and testing sets 70% and 30%
respectively

GDD Classification
Training procedure
respectively
Cluster the training datasets using k-means clustering algorithm into
k clusters where k ∈ [1 : 20]
(where k can be different for positive and negative subsets)

GDD Classification
Training procedure
respectively
Save the obtained centers

GDD Classification
Training procedure
respectively
Save the obtained centers
Test the performance of the selected combination of centers using the
test set

GDD Classification
Testing procedure
For every point in the test set, find the nearest centre to the test case

GDD Classification
Testing procedure
For every point in the test set, find the nearest centre to the test case
Calculate the classification score as illustrated in the next frame

Performance Metrics
Classification score = Sensitivity + Specificity +

Positive Predictive Accuracy + Negative Predictive Accuracy

Performance Metrics

TP+TN
Total accuracy = TP+TN+FP+FN

Performance Metrics

TP+TN
TP
Sensitivity = TP+FN

Performance Metrics

TP+TN
TP
Sensitivity = TP+FN
TN
Specificity = TN+FP

Performance Metrics

TP+TN
TP
Sensitivity = TP+FN
TN
Specificity = TN+FP
TP
Positive Predictive Accuracy = TP+FP

Performance Metrics

TP+TN
TP
Sensitivity = TP+FN
TN
Specificity = TN+FP
TP
Positive Predictive Accuracy = TP+FP
TN
Negative Predictive Accuracy = TN+FN

Slide-Detect Pre-processing
Separate the dataset classes into sample and control subsets

Normalize the subsets images

Perform a series of rotations, transitions, rescales, flips and zoom
operations to both the datasets

Save the resulting images to their corresponding subsets

Slide-Detect Feature Extraction









Crop images with positive infiltration labels around the infection as
the positive set


Crop images with positive infiltration labels around the infection as
the positive set
Crop images with healthy randomly as the negative set

Figure
Slide-Detect Classification procedure
Resize the input to 128X 128X 3 8-bit PNG image

Figure
Slide-Detect Classification procedure
Resize the input to 128X 128X 3 8-bit PNG image
Use the training set to train a 5-layer DNN shown blew

Slide-Detect Classification Metrics
Slide-Detect Classification procedure Metrics

Load and normalize all the healthy and sample images


Scan all the images progressively in 128X 128 manner


Feed the portions to the trained DNN


If 10 portions in a single image was classified positive, predict positive


If 10 portions in a single image was classified positive, predict positive
Calculate the classification accuracy

Plan

Diabetes Most Important features
Table 4: Feature Importance for the Diabetes dataset

Feature Importance factor
tolazamide Up 0.299
A1Cresult None 0.100
1ange 0.090
insulin Down 0.065
acetohexamide Steady 0.059

cervical Cancer Most Important features
Table 5: Feature Importance for the cervical Cancer dataset

Feature Importance factor
radius mean 0.287
area worst 0.266
area mean 0.209
perimeter se 0.119
texture worst 0.116

Diabetes dataset classification results
Table 6: Diabetes dataset classification results comparison
Algorithm GDD Singh-Prasad Singh-Halgamuge

Score 3.994 not available not available
Accuracy 0.999 0.885 0.7919
Sensitivity 1.0 not available not available
Specificity 0.999 not available not available
Positive Predictive Accuracy 1.0 not available not available
Negative Predictive Accuracy 0.994 not available not available

cervical Cancer dataset classification results
Table 7: Cervical Cancer dataset classification results comparison

Algorithm GDD Ferreira-Dutra Kharya-Soni
Score 3.819 not available not available
Accuracy 0.958 0.81 0.920
Sensitivity 0.924 not available not available
Specificity 0.977 not available not available
Positive Predictive Accuracy 0.960 0.60 not available
Negative Predictive Accuracy 0.956 0.89 not available

Slide-Detect Results
Table 8: Slide-Detect Results

Technique Accuracy
AlexNet 60.40%
GoogLeNet 60.87%
VGGNet-16 58.95%
ResrNet-50 61.27%
Dense networks with relative location awareness 70.9%
Multiple feature integration 70.3%
Slide-Detect 93.33%

Plan

Diabetes Discussion
Figure 4: Diabetes dataset after dimension reduction in 2D

Diabetes Discussion
Figure 5: Diabetes dataset after dimension reduction in 3D

Diabetes Score Progression
Figure 6: Score Progress in Diabetes dataset with changing the number of

positive and negative classes clusters

cervical Cancer Discussion
Figure 7: cervical Cancer dataset after dimension reduction in 2D

cervical Cancer Discussion
Figure 8: cervical Cancer dataset after dimension reduction in 3D

cervical Score Progression
Figure 9: Score Progress in cervical Cancer dataset with changing the number of
positive and negative classes clusters

Class distribution in the Xchest dataset
Figure 10: Class distribution in the Xchest dataset

Age distribution in the Xchest dataset
Figure 11: Age distribution among the Infiltration patients in the Xchest dataset

Plan

Conclusion
Conclusion
This work proposes 2 algorithms for computer aided disease of
(diabetes, cervical cancer, lung infiltration)
Both algorithms outperformed the state of the art on the same
datasets achieving accuracies of .0999, 0.958 for GDD and 0.9333 for
slide-detect

Plan

Future Work
Future Work
The GDD algorithm makes an exhaustive search for the optimum
number of clusters which is computationally expensive. A binary
search algorithm may achieve the same results in a log time
The slide detect algorithm can be extended to support 3D scans

The End


Automated Medical Diagnosis: Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona Farouk

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Automated Medical Diagnosis: Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona Farouk

Uploaded by

Copyright:

Available Formats

AUTOMATED MEDICAL DIAGNOSIS

May 15, 2022

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Figure 1: A summary of the previous work

Figure 2: Diabetes pipelines

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Figure 3: Cervical Cancer pipelines

Table 1: Comparison of the previous work techniques

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Table 2: Comparison of the previous work techniques

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Table 3: Comparison of the previous work techniques

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Cervical Cancer Dataset

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Cervical Cancer Dataset

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Cervical Cancer Dataset

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Cervical Cancer Dataset

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

GDD Feature selection

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

GDD Feature selection

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Ahmed Ezzat Supervisors: Prof. Magda B. Fayek Assoc Prof. Mona

Classification score = Sensitivity + Specificity +