Professional Documents
Culture Documents
1,2 ?
Wei Ji , Wenting Chen 1,3 ? , Shuang Yu 1 , Kai Ma 1 , Li Cheng 2 , Linlin
Shen 3 , and Yefeng Zheng 1
1
Tencent HealthCare
shirlyyu, kylekma, yefengzheng@tencent.com
2
University of Alberta
wji3,lcheng5@ualberta.ca
3
School of Computer Science, Shenzhen University, Shenzhen, China
chenwenting2017@email.szu.edu.cn, llshen@szu.edu.cn
1 Introduction
Along with the development of deep learning, it is reported that the performance
of image segmentation [1, 13, 12] and object detection [4, 10, 11] has reached the
human level performance for some specific tasks, for example, longitudinal brain
tumor volumetry segmentation [6]. Many medical datasets are labeled by differ-
ent experts, so as to avoid the subjective bias or potential problems caused by
different levels of clinical domain knowledge, negligence of subtle symptoms and
quality of images [8]. Then the final ground-truth label is generally obtained via
majority vote or weighted average of the raw gradings, or other fusion techniques.
?
Wei Ji and Wenting Chen have equal contribution.
2 Wei Ji et al.
Average weight 𝟏𝟔 𝟏𝟔 𝟔 𝟔 𝟔 𝟔
𝟔
𝟏
𝟏 Rater 5 prediction Rater 5 label
assignment 𝟔 × 𝟏 × 𝟏 𝟔
𝟔 𝟏 𝟏
𝟔 × 𝟏𝟔 × 𝟏𝟔 𝟓𝟏𝟐 × 𝟏𝟔 × 𝟏𝟔 𝟐 × 𝟓𝟏𝟐 × 𝟓𝟏𝟐 BCE Loss 𝟐 × 𝟓𝟏𝟐 × 𝟓𝟏𝟐
𝟔 𝟔
Label factor Factor maps Rater 6 prediction Rater 6 label
(b) 𝟎
𝟎
Expand 𝟏 𝟏
𝟎
𝟏 𝟐 × 𝟓𝟏𝟐 × 𝟓𝟏𝟐 BCE Loss 𝟐 × 𝟓𝟏𝟐 × 𝟓𝟏𝟐
𝟎
𝟏 Or
Conv
Label sampling 𝟏
𝟎
𝟏 𝟔 𝟔
𝟔×𝟏×𝟏 𝟔 × 𝟏𝟔 × 𝟏𝟔 𝟓𝟏𝟐 × 𝟏𝟔 × 𝟏𝟔
Sum Sum
Label factor Factor maps
𝟎. 𝟏
(c) 𝟎. 𝟏
𝟎. 𝟐 Expand
𝟎. 𝟏
𝟎.4 𝟎.4
Conv
2 Method
Fig. 1 demonstrates the overview of the proposed uncertainty estimation method
for medical image segmentation using dynamic label factor allocation among
multiple raters. The overall architecture is composed of three parts, including
(1) the segmentation network, (2) the dynamic label factor allocation and (3)
the individual training loss for the prediction of each rater. In this section, we
use the prostate segmentation as an example task to describe our method.
Uncertainty Quantification Using Dynamic Label Factor Allocation 3
As for the segmentation network, we adopt the widely used U-Net [7] with a
pretrained ResNet-34 as the encoder, and the decoder contains six outputs cor-
responding to the six raters and each output has two-channels for the two tasks
of prostate segmentation. To exploit the importance of the raw annotations la-
beled by different raters, we apply a dynamic label factor allocation mechanism
to randomly generate weight for the importance of individual raters, and based
on which the ground-truth is dynamically obtained. We concatenate the dynamic
label factor and the embedding features encoded by the encoder of segmentation
network, and feed the decoder with the concatenation of the two to generate
the prediction for different raters. Then, we pixel-wisely multiply the prediction
probability map with the corresponding label factor for each rater, as the output
prediction of the branch. Similarly, the pixel-wise multiplication of the raw an-
notation by individual raters with the dynamic label factor is computed as the
ground-truth for the branch. Afterwards, the model is supervised via the combi-
nation of raw annotations and the fused ground-truth via binary cross entropy
loss, including 1) the prediction of six raters with that of raw six annotations,
and 2) the fused prediction with the fused ground-truth label. Finally, the final
model prediction is obtained by summing all the weighted prediction for each
individual rater to obtain the final prediction.
Considering that each rater has different confidence for the task due to different
clinical expertise, we introduce a novel training strategy, the dynamic label factor
allocation, to exploit the importance of different raters and further to improve
the model generalization by using different label factors as an indirect data
augmentation strategy. The dynamic allocation strategy consists of three types
of label factor allocation mechanisms, i.e. (1) the average weight assignment, (2)
label sampling, and (3) random weight assignment.
As for the average weight assignment, it first provides the label factor
z ∈ RN ×1×1 , where N denotes the number of rater. Each element of z is equal
to N1 . It means that each rater has the same confidence. Then, we expand the
label factor z to the dimension of N × H × W , where H and W denote the height
and width of the embedding features f ∈ RC×H×W encoded by the encoder of
the segmentation network. C represents the number of channel for embedding
features f . Afterwards, a 1×1 convolutional layer is applied to expand the factor
maps to have the same channel numbers as that of the feature map f , and thus we
obtain M ∈ RC×H×W . Finally, the factor maps M and the embedding features
f are concatenated and fed into the decoder to further generate prediction for
each rater, respectively.
Regarding label sampling and random weight assignment mechanism,
the processes of generating factor maps are the same as the that of the average
weight allocation mechanism. In terms of label factor, label sampling allocation
mechanism randomly set an element of the label factor as 1 and the rest of
elements as 0, while the random weight allocation mechanism allocates each
4 Wei Ji et al.
element of label factor with a random probability and then normalize the label
factor to have the summation of 1.
To obtain the final prediction of each rater, we multiply the prediction of
0
each rater yi (i ∈ {1, 2, ..., N }) by the corresponding label factor ri . To construct
the weighted label of each rater yi (i ∈ {1, 2, ..., N }) , we also perform pixel-wise
multiplication of the real label given by different raters and the corresponding
label factor. Afterwards, we compute the binary cross entropy loss between the
0
final prediction of each rater (yi · ri ) and weighted label of each rater (yi · ri )
to supervise the training of the segmentation network. In addition,Pwe sum over
N 0
the final prediction of each rater and obtain the final prediction ( i=1 (yi · ri ))
for our framework. The objective functions are defined as:
N N N
X X 0 X 0
Loss = λ1 BCE( (yi · ri ), (yi · ri )) + λ2 BCE(yi · ri , yi · ri ) (1)
i=1 i=1 i=1
2
X
BCE(target, pred) = − targetc log(predc ) (2)
c=1
As shown in Eq. 1, the first term represents the binary cross entropy loss between
the final prediction (i.e., the weighted summation of all raters’ predictions) with
the overall fused ground-truth; and the second term represents the loss between
the weighted prediction of each rater with that of the weighted raw annotation
of the corresponding rater. We set λ1 and λ2 as 0.5, and c denotes the cth class
of the output.
3 Experiments
Table 1. Comparison with existing methods on four different QUBIQ datasets (%).
main methods, i.e. the commonly used majority voting method [9], label sam-
pling method which randomly selects label from the label pool of multiple an-
notations [3], and weighted doctor net which predicts the individual rater’s pre-
diction using multi-branches [2]. As listed in Table 1, the weighted doctor net
solution surpasses the majority vote and label sampling methods, it indicates
that the labels given by different raters should be utilized instead of only apply-
ing one of labels given by one rater or only the fused ground-truth. Moreover,
our proposed method achieves the optimal performance and outperforms the
above mentioned three methods by a large margin. It indicates that the pro-
posed method can properly utilize the importance of individual raters to better
quantify the uncertainty.
References
1. Chen, W., Yu, S., Wu, J., Ma, K., Bian, C., Chu, C., Shen, L., Zheng, Y.: Tr-gan:
Topology ranking gan with triplet loss for retinal artery/vein classification. arXiv
preprint arXiv:2007.14852 (2020)
2. Guan, M.Y., Gulshan, V., Dai, A.M., Hinton, G.E.: Who said what: Modeling
individual labelers improves classification. arXiv preprint arXiv:1703.08774 (2017)
3. Jensen, M.H., Jørgensen, D.R., Jalaboi, R., Hansen, M.E., Olsen, M.A.: Improving
uncertainty estimation in convolutional neural networks using inter-rater agree-
ment. In: International Conference on Medical Image Computing and Computer-
Assisted Intervention. pp. 540–548. Springer (2019)
4. Ji, W., Li, J., Zhang, M., Piao, Y., Lu, H.: Accurate rgb-d salient object detection
via collaborative learning. ECCV (2020)
5. Jungo, A., Meier, R., Ermis, E., Blatti-Moreno, M., Herrmann, E., Wiest, R.,
Reyes, M.: On the effect of inter-observer variability for a reliable estimation of
uncertainty of medical image segmentation. In: International Conference on Medi-
cal Image Computing and Computer-Assisted Intervention. pp. 682–690. Springer
(2018)
6. Meier, R., Knecht, U., Loosli, T., Bauer, S., Slotboom, J., Wiest, R., Reyes, M.:
Clinical evaluation of a fully-automatic segmentation method for longitudinal brain
tumor volumetry. Scientific reports 6, 23376 (2016)
7. Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional networks for biomedi-
cal image segmentation. In: International Conference on Medical Image Computing
and Computer Assisted Intervention. pp. 234–241. Springer (2015)
6 Wei Ji et al.
8. Schaekermann, M., Beaton, G., Habib, M., Lim, A., Larson, K., Law, E.: Under-
standing expert disagreement in medical data analysis through structured adjudi-
cation. Proceedings of the ACM on Human-Computer Interaction 3(CSCW), 1–23
(2019)
9. Yu, S., Zhou, H.Y., Ma, K., Bian, C., Chu, C., Liu, H., Zheng, Y.: Difficulty-
aware glaucoma classification with multi-rater consensus modeling. arXiv preprint
arXiv:2007.14848 (2020)
10. Zhang, M., Ji, W., Piao, Y., Li, J., Zhang, Y., Xu, S., Lu, H.: Lfnet: Light field
fusion network for salient object detection. IEEE TIP 29, 6276–6287 (2020)
11. Zhang, M., Li, J., Ji, W., Piao, Y., Lu, H.: Memory-oriented decoder for light field
salient object detection. In: Advances in Neural Information Processing Systems.
pp. 898–908 (2019)
12. Zhao, H., Li, H., Cheng, L.: Improving retinal vessel segmentation with joint local
loss by matting. Pattern Recognition 98, 107068 (2020)
13. Zhao, H., Li, H., Maurer-Stroh, S., Guo, Y., Deng, Q., Cheng, L.: Supervised
segmentation of un-annotated retinal fundus images by synthesis. IEEE TMI 38(1),
46–56 (2018)