You are on page 1of 10

Problem Statement

- Pneumothorax is a medical condition characterized by the presence of air in the


pleural space, causing lung collapse.
- As per clinical literature, small pneumothorax may not require immediate
treatment, allowing for some segmentation inaccuracies. However, classifying
images correctly as pneumothorax or non-pneumothorax remains crucial.
- Previous models primarily focused on image segmentation without considering
the importance of accurate classification.
- A multimodal approach involves using both text reports and images to improve
the accuracy of pneumothorax diagnosis and classification.
Contributions
Base - CRIS model

i) Changing Image Encoder

ii) Changing Loss Function

iii) Removing need for text during inference

iv) Changing projection module to avoid interpolation


Changing Image Encoder
CRIS uses ResNet encoder by default

i) UNet

ii) MultiResUNet (varying weights of nxn conv.)


Loss Function
CRIS uses BCE loss by default

New loss function - Dice_with_sigmoid + BCE

Updated loss function causes significant increase in performance


Removing text during inference
Changed projection module

- Removed final layer in which cross-convolution between image and text


encoding happened

- The final projector layer just contains layers of upsampling and simple
convolutions

Made minor changes to neck and transformer as well


Classification accuracy

fold 0 fold 1 fold 2

CRIS-multiresunet1 0.892 0.873 0.885

CRIS-multiresunet1- 0.190 0.420 0.195


without-text

CRIS-multiresunet2 0.925 0.809 0.839

lvit 0.807 0.832 0.763

unet 0.595 0.489 0.432


Avg. Dice score (medium and large)

fold 0 fold 1 fold 2

CRIS-multiresunet1 0.656, 0.776, 0.700 0.626, 0.815, 0.701 0.685, 0.817, 0.736

CRIS-multiresunet1- 0.562, 0.607, 0.578 0.493, 0.684, 0.569 0.536, 0.641, 0.576
without-text

CRIS-multiresunet2 0.630, 0.764, 0.679 0.579, 0.726, 0.637 0.671, 0.798, 0.719

lvit 0.638, 0.790, 0.694 0.593, 0.803, 0.646 0.643, 0.803, 0.703

unet 0.653, 0.790, 0.703 0.605, 0.816, 0.689 0.657, 0.784, 0.705

(medium, large, avg)


Avg. Dice score (negative)

fold 0 fold 1 fold 2

CRIS-multiresunet1 0.875 0.729 0.864

CRIS-multiresunet1- 0.028 0.316 0.039


without-text

CRIS-multiresunet2 0.917 0.732 0.809

lvit 0.774 0.807 0.719

unet 0.523 0.392 0.369


Avg. Dice (Small)

fold 0 fold 1 fold 2

CRIS-multiresunet1 0.400 0.406 0.397

CRIS-multiresunet1- 0.380 0.303 0.345


without-text

CRIS-multiresunet2 0.371 0.332 0.398

lvit 0.370 0.392 0.385

unet 0.329 0.377 0.324


Conclusion
Our model (with text) is able to perform better on
i) classification task
ii) segmentation task for small ptx

Removing text during inference reduces performance significantly

No significant difference can be seen for medium and large ptx segmentations wrt to other models

You might also like