Professional Documents
Culture Documents
6.1.1 Segmentation and Classification Sobelm = max (Hx (px ), Hy (py )) (1)
where px and py refer to the horizontal and vertical pre-
Figure 4 , give the overview of the approach for simultane- dictions at the output of the HoVer branch and Hx and Hy
ous nuclear instance segmentation and classification. When refer to the horizontal and vertical components of the Sobel
no classification labels are available, the network produces operator. Specifically, Hx and Hy compute the horizontal
the instance segmentation as shown in (a). The different and vertical derivative approximations and are shown by
colors of the nuclear boundaries represent different types of the gradient maps.
nuclei in (b).
The highlights areas where there is a significant differ-
ence in neighbouring pixels within the horizontal and ver-
tical maps. Therefore, areas such as the ones shown by the
arrows in will result in high values within the Sobel operator
.We compute markers
PN
2× i=1 (Yi (I) × Xi (I)) + ϵ
Dice = 1 − PN PN (5)
i=1 Yi (I) + i=1 Xi (I) + ϵ
where LNP denotes the loss for the NP-branch, LHV the distance maps and LM SGE the mean squared error of the
loss for the HV-branch, LNT the loss for the NT-branch, gradients of the horizontal and vertical distance maps, each
and LTC the loss for the TC-branch. Overall, the individ- summarized for both directions separately. In the segmen-
ual branch losses are composed of the following weighted tation losses (7)–(9), yic is the ground-truth and ŷic the pre-
loss functions: diction probability of the ith pixel belonging to the class c,
C the total number of nuclei classes, Npx the total amount
LN P = λN PF T LF T + λN PDICE LDICE
of pixels, ε a smoothness factor and αF T , βF T and γF T are
LHV = λHVM SE LM SE + λHVM SGE LM SGE hyperparameters of the Focal Tversky loss LF T . The Cross-
LN T = λN TF T LF T + λN TDICE LDICE + λN TBCE LBCE Entropy loss (9) and Dice loss (8) are commonly used in
LT C = λT C LCE semantic segmentation. To address the challenge of under-
represented instance classes, the Focal Tversky loss.a gen-
with the individual segmentation losses eralization of the Tversky loss, is used. The Focal Tver-
Npx C
sky loss places greater emphasis on accurately classifying
1 XX underrepresented instances by assigning higher weights to
LBCE =− yi,c log(ŷic ) (7)
n i=1 c=1 those samples. This weighting enhances the model’s capac-
ity to handle class imbalance and focuses its learning on the
PNpx more challenging regions of the segmentation task.
2× i=1 yic ŷic + ε
LDICE = 1 − PNpx PNpx (8)
i=1 yic + i=1 ŷic + ε 7.3. Experiments
and the cross-entropy as tissue classification loss :
Moving on to our significant contribution to utilizing
CT
X CellViT for our research project, we made the strategic
LCE = − ycT log(ŷcT ), CT = 19, (9) decision to replace the CellViT encoder with a MedSam
c=1 encoder. This encoder, fundamentally the same as the
with the contribution of each branch loss to the total loss SAM encoder developed by Meta, has been retrained en-
(6) controlled by the i-th hyperparameters λi . LM SE de- tirely on medical data. Our intuition behind this modifi-
notes the mean squared error of the horizontal and vertical cation was grounded in the belief that integrating medical-
specific data on top of the more generic data used in Meta’s Tissue Type CellViT (SAM- CellViT (Med-
SAM model would yield better results. By tailoring the en- B) SAM)
coder to reflect the complexity and nuances of medical im- Ovarian 0.8398 0.8362
agery more closely, we was expecting an improvement in Thyroid 0.7976 0.8219
model’s performance, particularly in the segmentation and Stomach 0.8546 0.8605
classification of cell nuclei within histopathology images of Uterus 0.8007 0.7818
PanNuke. As MedSAM encoder uses pretrained SAM-B Adrenal Gland 0.8009 0.8148
weights, we have train CellViT with SAM-B to stay coher- Bladder 0.75449 0.7411
ent on our results. Bile Duct 0.7607 0.7773
Liver 0.8427 0.861
7.4. Results Head and Neck 0.6586 0.6318
Pancreatic 0.8275 0.8457
Initially, CellVit has been trained using one GPU Nvidia
Breast 0.8124 0.8114
1080 tx for 130 epochs where the encoder has been frozen
Prostate 0.7973 0.8301
for the first 25 epochs. To align with the original training,
Testis 0.8083 0.8135
we used the exact same configuration as in the original
Colon 0.6833 0.7075
paper with a learning rate scheduling with a scheduling
factor of 0.85 to gradually reduce the learning rate during Esophagus 0.8192 0.8135
training. However, due to the limitation of computational Cervix 0.6991 0.7625
resources, we have only train for 40 epochs for 36 hours. Kidney 0.7417 0.8204
However, we already significative results with only 40 Skin 0.7201 0.7086
epochs. Lung 0.8019 0.8059