You are on page 1of 28

Pneumonia Detection Challenge

Project Team
 Ashit
 Kavitha KH
 Pavithra N M
 Santhosh
Mentor: Soumen
Program Manager : Mohammad

Date: 7th Nov 2020


Pneumonia Detection Challenge

AGENDA

SL.No Topics
1 Problem Statement
2 About Pneumonia
3 Business Value
4 DICOM files & Medical images
5 Dataset Description
6 EDA and Pre-Processing
7 Approach/Papers
8 Modeling Building Evaluation
9 Enhancement of Models
Pneumonia Detection Challenge

Project Timeline and Status

SL.
No Weeks Milestones Tasks/Activities Target date Status

1 Project release and 23rd Oct 2020 Completed


Team Formation
2 Problem Interpretation 1st Nov 2020 Completed
3 Week - 0 to Interim Report
4 Submission EDA and Pre-Processing 8th Nov 2020 Completed
4 Modeling 15th Nov 2020
5 Interim report 15th Nov 2020
submission
6 Model evaluation 22th Nov 2020
Week - 5 & Final Report
7 6 Submission Presentation and
Report 29th Nov 2020
Pneumonia Detection Challenge
Problem Statement

The objective of this project is to explore the dataset for Pneumonia Detection challenge
and build a model which can automatically identify whether a patient is suffering from
pneumonia or not by looking at chest X-ray images (CXR). The model had to be extremely
accurate because lives of people is at stake.

Pneumonia accounts for over 15% of all deaths of children under 5 years old internationally.
In 2017 , 920,000 children under the age of 5 died from the disease.

We start by exploring Train_labels ,Test_labels ,Class_info datasets and then DICOM data,
we extract the meta information from the DICOM files and visualize the various features of
the DICOM images, grouped by age, sex.
Pneumonia Detection Challenge
About Pneumonia
Pneumonia is an infection in one or both lungs caused by bacteria, virus and
fungus causing inflammation in air-sacs of lung known as Alveoli
Common Symptoms : Other Symptoms :

Other symptoms can vary according to out age and


general health :

 Children under 5 years old may have fast


breathing or wheezing.

 Infants may appear to have no symptoms, but


sometimes they may vomit, lack energy, or have
trouble drinking or eating.
 Fluid Overload (Pulmonary Edema)
 Bleeding  Older people may have milder symptoms. They
 Volume Loss (atelectasis)
can also exhibit confusion or a lower-than-
 Lung Cancer/ Surgical Changes
 Pleural Effusion
normal body temperature.
Pneumonia Detection Challenge
About Pneumonia
 Pneumonia is a leading killer of children younger than 5 years despite high
vaccination coverage, improved nutrition, and widespread implementation of
the Integrated Management of Childhood Illnesses algorithm.

 Assessing the effect of interventions on childhood pneumonia is challenging


because the choice of case definition and surveillance approach can affect
the identification of pneumonia substantially.

Detection Challenges : Detection Methods :


 CXR Method counts on presence of ‘increased
opacity in lung’ to detect Pneumonia but  Blood Test Presence of microbes in blood
increased opacity in lung may be due to other
reasons also. Thus “Highly Trained Expert” &
 Chest Radiograph (CXR)Presence of
“Clinical Evidences” are generally required. increases opacity
 Pulse Oximetry Oxygen level in blood
 More Challenges: Position of patient & depth of  Sputum Test  Mucus specimen test from
inspiration lung
 When available, comparison of CXRs of the
patient taken at different time points and Healthy Infected
correlation with clinical symptoms and history
are helpful in making the diagnosis.
Pneumonia Detection Challenge
About Pneumonia

Chest Radiograph (CXR) : In this method, the presence of Increases opacity has to identify by locating
the position of inflammation in CXR images.

 Tissues having sparse material does not


absorb X-Rays. They appear ‘Black’ in CXR
Images.

 Bones having dense material absorb X- Rays.


They appear ‘White’ in CXR Images.

 Lung Opacity : Pneumonia


 Lung Opacity : But Not
Pneumonia
 No Lung Opacity: Normal

AI Model
Pneumonia Detection Challenge

Objective

 To detect the presence of pneumonia using binary classification and create a boundary
box for the same.
 These objectives have been tried by different research groups. However here in this
project we are trying improve the mapping identification and efficiency.

Further, it can be enhanced with image captioning and report generation using statistical
NLP
Pneumonia Detection Challenge

Business Value

 Automating Pneumonia screening in chest radiographs, by providing affected area


details through bounding box.

 Assist physicians to make better clinical decisions or even replace human judgement in
certain functional areas of healthcare (eg, radiology).

 Guided by relevant clinical questions, powerful AI techniques can unlock clinically


relevant information hidden in the massive amount of data, which in turn can assist
clinical decision making.
Pneumonia Detection Challenge
DICOM files & Medical images

 Medical Images are stored in a special format known as DICOM


files(*.dcm)

 They contain a combination of header metadata as well as underlying


raw image arrays for pixel data.

On Python

 One popular library to access and manipulate DICOM


files is the pydicom module
 Install pydicom library
 Find the DICOM file for a given patientId in the
train_images folder
 Use the pydicom.read_file() method to load the data
Pneumonia Detection Challenge
Dataset Description
The dataset is organized in several files and folders

This file contains training set of PatientIds and labels


(including bounding boxes)
stage_2_train_labels.csv [ Patient ID, x-min, y-min , width, height and Target]

This file contains training set of detailed labels i.e Patient ID


and Classification . It Provides detailed information about the
stage_2_detailed_class_i type of positive or negative class for each image.
nfo.csv Classes : Normal / lung opacity / no lung opacity (not
normal)

This file contains training set of raw image (DICOM) files


stage_2_train_images
Files contains the combination of header metadata as well as
underlying raw image arrays for pixel data.

This file contains Test set of raw image (DICOM) files


stage_2_test_images
Files contains the combination of header metadata as well as
underlying raw image arrays for pixel data.
Pneumonia Detection Challenge

EDA and Pre-Processing

Data Variables :

 patientId_ - Each patientId corresponds to a unique image.


 x_ - the upper-left x coordinate of the bounding box.
 y_ - the upper-left y coordinate of the bounding box.
 width_ - the width of the bounding box.
 height_ - the height of the bounding box.
 Target_ - the binary Target, indicating whether this sample has evidence of pneumonia.
 Class ‘
Pneumonia Detection Challenge
EDA on Class Info

Train Labels Info:


 Data Consists of Patient ID & Class
 3 Unique classes are present.
 Shape of class info : (30227, 2)
 Don’ t have missing data

Summary: Count Data

 No Lung Opacity / Not Normal = 39.11%


 Normal = 29.28%
 Lung Opacity = 31.61%
 Missing data percentage is sum of both "No Lung
Opacity/ Not Normal" and "Normal"
Pneumonia Detection Challenge
EDA on Train Labels
Train Labels Info:
 Data Consists of Patient ID & Bounding Box details with
Target variable.
 There are 26684 unique Patient Id’s present
 Shape of class info : (30227, 6)
 Total percent of missing data on bounding box is 68.39%
over 20672 entries.
 Missing data percentage is sum of both "No Lung Opacity/
Not Normal" and "Normal“
 Describe function is used to find count, mean , Std
deviation, minimum, maximum value

Summary :

 All chest examinations with Target = 1 (pathology


detected) associated with class: Lung Opacity.
 The chest examinations with Target = 0 (no pathology
detected) are either of class: Normal or class: No Lung
Opacity / Not Normal.
Pneumonia Detection Challenge
EDA on Train Labels
Distribution of Lung Opacity & Bounding Box Distribution

Variable X Variable Y

Summary :

 Sample of center points superposed with the


corresponding sample of the rectangles Variable Height
Variable Width
Pneumonia Detection Challenge

EDA on Dicom data

Critical Information In META data Info:

 Patient sex
 Patient age
 Modality
 Body part examined
 View position
 Rows & Columns
 Pixel Spacing

 Number of images in train set: 26684 (unique


Patients)
 Number of images in test set: 3000
 Train class data shape is: (37629, 7)
Pneumonia Detection Challenge

EDA : Checking Dicom Images

Without Bounds

With Bounds
Pneumonia Detection Challenge

EDA Conclusion

 Males are also more likely to develop


community-acquired and nosocomial
bacterial pneumonias than are females .
 Furthermore, the severity of
 pneumonia appears greater in male patients,
since as males have a higher risk of
hospitalization and mortality due
to pneumonia
AIML Capstone Project- Pneumonia Detection Challenge

13. EDA: Dicom Data


 Age Vs. Target Distribution( All Ages<=5 Years )

Team: SANKALPA
AIML Capstone Project- Pneumonia Detection Challenge

13. EDA: Dicom Data


 Sex Vs. Target Distribution( All Ages )  Sex Vs. class distribution( All Ages )

Team: SANKALPA
AIML Capstone Project- Pneumonia Detection Challenge

13. EDA: Dicom Data: Lung Opacity Distribution (Age


Based)

 Age Group (1-20)  Age Group (21-40)  Age Group (>40)

Team: SANKALPA
AIML Capstone Project- Pneumonia Detection Challenge

13. EDA: Dicom Data: Lung Opacity Distribution (Sex


Based)

 Female  Male

Team: SANKALPA
AIML Capstone Project- Pneumonia Detection Challenge

12. Approach

Training the data with different DL


models , choose the best model
Pre-Processing, EDA based on performance
Model Building
Data Visualization

Evaluating score, IOU,


Based on score
Improving the model

Improving the performance of


the model by tuning different
parameters and testing

Model Tuning and


Testing Model Evaluation

Team: SANKALPA
AIML Capstone Project- Pneumonia Detection Challenge

Approach

STAGE 1: DATA PRE-PROCESSING, DATA VISUALISATIONS AND EDA

 Understand the problem statement and exploring the data.

 Merge stage2_train_labels.csv and stage_2_detailed_class_info.csv.

 Extract Metadata from DICOM images (stage_2_train_images)and store in a


.csv/.pkl format.

 Missing value and class Imbalance treatment.

 Univariate analysis, bivariate analysis between predictor and target column.

 Visualize Analysis by histogram, pair-plot, count plot and plot different classes
of image and draw conclusions.

Team: SANKALPA
AIML Capstone Project- Pneumonia Detection Challenge

Approach
STAGE 2: Model Building

Split the data to train, test and validation set.

Train the model with basic CNN first and calculate score.

Saving the weights and model for further improvement.

Apply different advanced algorithms like VGG-Net , Mask R-CNN, U-Net.

Choose the model which performs better.

Team: SANKALPA
AIML Capstone Project- Pneumonia Detection Challenge

Approach
STAGE 3 and 4- Model Tuning, Testing and Evaluation

 Test the model and calculate MAP@ Different IOU thresholds.

 Plot AUC curve –which indicates False Positives and True Negatives.

 Improve the model by applying different:

 optimization techniques - Adam and RMSPROP.

 Batch Normalization, Dropout, learning rate.

 Early Stopping to prevent over fitting.

 Evaluate and observe the behavior of model on tuning different hyper


parameters.

Team: SANKALPA
AIML Capstone Project- Pneumonia Detection Challenge

11. Predictions

 In this project, we have to predict whether pneumonia exists in a given image.


This will be done by predicting bounding boxes around areas of the lung.

 Samples without bounding boxes are negative and contain no definitive


evidence of pneumonia. Samples with bounding boxes indicate evidence of
pneumonia.

 When making predictions, the model should predict as many bounding boxes
as necessary, in the format: confidence x-min y-min width height

 There will be only ONE predicted row per image. This row may include
multiple bounding boxes.
A properly formatted row may look like any of the following.
For patientIds with no predicted pneumonia / bounding boxes: 0004cfab-14fd-4e49-80ba-63a80b6bddd6,
For patientIds with a single predicted bounding box: 0004cfab-14fd-4e49-80ba-63a80b6bddd6,0.5 0 0 100 100
For patientIds with multiple predicted bounding boxes: 0004cfab-14fd-4e49-80ba-63a80b6bddd6,0.5 0 0 100 100 0.5 0 0
100 100, etc.

Team: SANKALPA
AIML Capstone Project- Pneumonia Detection Challenge

15. Modeling :

https://www.thelancet.com/journals/lanres/article/PIIS2213-2600(19)30249-
8/fulltext

Team: SANKALPA

You might also like